PM2.5 Forecast Correction Based on Multi-Machine Learning
-
摘要: 大气细颗粒物污染深刻影响着人体健康、大气能见度以及气候变化等诸多方面,对PM2.5浓度进行精细化预报至关重要。因此,本研究基于随机森林(Random Forest,RF)、LightGBM(Light Gradient Boosting Machine)和XGBoost(Extreme Gradient Boosting Machine)等多种机器学习方法,分别构建了基于中国气象局雾-霾数值预报系统(CMA Unified Atmospheric Chemistry Environment,CUACE)Ⅴ3.0版本的京津冀地区PM2.5订正预报模型,并对比分析各个机器学习模型的预报效果和性能差异。研究结果表明:三种机器学习算法误差(ME)和平均绝对误差(MAE)均较CUACE雾霾模式明显降低,而且在不同预报时效下的ME和MAE变化幅度更小,反映出基于机器学习算法得到的PM2.5订正预报稳定性较好。此外,在三种机器学习算法中,RF算法预报性能最佳,ME和MAE分别为-3.0 μg·m-3和23.6 μg·m-3,RF平均绝对误差的改善幅度最为突出,达到11.5%,而且对区域内“正订正”的站点比例达到97.7%,明显优于LightGBM和XGBoost算法。另外,对2024年3月9日至12日雾霾天气过程进行预报检验评估,RF算法的TS评分最高,其轻度污染、中度污染以及重度污染及以上TS评分分别为0.43、0.19和0.03。由此可以看出,RF算法的预报效果更为突出,研究结果在实际业务预报中具有一定参考意义。
-
关键词:
- 机器学习 /
- PM2.5 /
- 中国气象局雾-霾数值预报系统 /
- 订正
Abstract: Fine particulate matter (PM2.5) pollution in the atmosphere profoundly affects human health, atmospheric visibility, and climate change; therefore, accurate forecasting of PM2.5 concentration is essential. This study developed PM2.5 forecast correction models for the Beijing-Tianjin-Hebei region using the China Meteorological Administration Unified Atmospheric Chemistry Environment for Haze Ⅴ3.0 (CUACE-Haze 3.0) model and various machine learning methods, including random forest (RF), light gradient boosting machine (LightGBM), and extreme gradient boosting machine (XGBoost). The forecast performance and differences among these machine learning models were then compared and analyzed. The results show that the mean error (ME) and mean absolute error (MAE) for the three machine learning algorithms were significantly lower than those of the CUACE model. The variation ranges of ME and MAE under different forecast time intervals were smaller, indicating better stability of the PM2.5 forecasts obtained based on machine learning algorithms. Furthermore, among the three machine learning algorithms, RF exhibited the best forecast performance, with ME and MAE of -3.0 μg m–3 and 23.6 μg m–3, respectively. The improvement in MAE for RF was the most prominent, reaching 11.5%, and the proportion of stations with positive correction was 97.7% in this region, significantly better than those of LightGBM and XGBoost. Additionally, during the verification and evaluation of forecast for the haze from March 9 to 12, 2024, RF achieved the highest threat score (TS), with TS scores of 0.43, 0.19, and 0.03 for light pollution, moderate pollution, and severe pollution or above, respectively. This demonstrates that the forecast performance of the RF algorithm is superior, and the research results provide valuable references for operational forecasting.-
Key words:
- machine learning /
- PM2.5 /
- CUACE-Haze 3.0 /
- correction
-
表 1 数值模式预报场要素清单
来源 层次 要素 ECMWF 500、700、850 hPa 位势高度、纬向风分量、经向风分量、温度、相对湿度、垂直速度、绝对湿度 地面 10 m纬向风分量、10 m经向风分量、100 m纬向风分量、100 m经向风分量、2 m温度、海平面气压、2 m露点温度 CMA-CUACE-Haze 3.0 地面 PM2.5、PM10、SO2、NO2、VIS 表 2 LightGBM参数选取范围和最佳参数(以北京为例)
参数 参数选取范围 最佳参数 num_leave决策树上最大叶子树 [10,20,30,40,50] 30 learning_rate学习率 [0.05, 0.1, 0.3, 0.6, 0.9] 0.6 feature_fraction在每棵树上随机选择的特征比例 [0.1, 0.2, 0.3, 0.4, 0.5, 0.6] 0.4 bagging_fraction每次迭代时用的数据比例 [0.1, 0.3, 0.5, 0.7, 0.9] 0.7 max_depth树的最大深度 [2,6,15] 6 表 3 基于不同机器学习算法的PM2.5订正预报模型性能对比(以北京站为例)
机器学习算法 ME(μg·m-3) MAE(μg·m-3) R2 RF -4.3 15.7 0.76 LightGBM -5.7 18.3 0.72 XGBoost -5.9 17.9 0.71 表 4 基于不同机器学习算法的京津冀地区PM2.5订正预报模型性能对比
方法 ME(μg·m-3) MAE(μg·m-3) R2 CUACE -8.7 30.9 0.53 RF -3.0 23.6 0.71 LightGBM -3.3 24.1 0.68 XGBoost -3.2 24.9 0.66 -
[1] 中华人民共和国生态环境部, 2023中国生态环境状况公报[R/OL]. (2023-06-06)[2024-03-07]. [2] YU Z G, SUN Z, LIU L Z, et al. Environmental surveillance in Jinan city of East China (2014-2022) reveals improved air quality but remained health risks attributable to PM2.5-bound metal contaminants[J]. Environmental Pollution, 2024, 343(15): 123275. [3] HAO Y H, GOU Y F, WANG Z S, et al. Current challenges in the visibility improvement of urban Chongqing in Southwest China: From the perspective of PM2.5-bound water uptake property over 2015-2021[J]. Atmospheric Research, 2024, 300(15): 107215. [4] TSAI I C, HSIEH P R, HSU H H, et al. Climate change-induced impacts on PM2.5 in Taiwan under 2 and 4℃ global warming[J]. Atmospheric Pollution Research, 2024, 15(6): 102106. [5] XU H H, CHEN H. Impact of urban morphology on the spatial and temporal distribution of PM2.5 concentration: A numerical simulation with WRF/CMAQ model in Wuhan, China[J]. Journal of Environmental Management, 2021, 290(15): 112427. [6] DAI S M, CHEN X W, LIANG J, et al. Response of PM2.5 pollution to meteorological and anthropogenic emissions changes during COVID-19 lockdown in Hunan Province based on WRF-Chem model[J]. Environmental Pollution, 2023, 331(2): 121886. [7] ZHANG H Y, CHENG S Y, YAO S, et al. Insight into the temporal and spatial characteristics of PM2.5 transport flux across the district, city and region in the North China Plain[J]. Atmospheric Environment, 2019, 218: 117010. [8] 李细生, 张华, 喻雨知, 等. 基于机器学习分类算法的臭氧浓度等级预报在长沙的应用[J]. 热带气象学报, 2023, 39(4): 453-461. [9] 张容硕, 谢沛远, 陈宏飞, 等. 基于机器学习的郑州市大气PM2.5与O3浓度预测方法及气象因子的影响分析[J]. 环境科学研究, 2024, 37(3): 469-478. [10] 徐发昭, 李净, 褚馨德, 等. 基于MODIS数据与多机器学习法的日PM2.5模拟研究[J]. 中国环境科学, 2022, 42(6): 2 523-2 529. [11] GAO Z Q, DO K, LI Z R, et al. Predicting PM2.5 levels and exceedance days using machine learning methods[J] Atmospheric Environment, 2024, 323(15): 120396. [12] PENG J, HAN H S, YI Y, et al. Machine learning and deep learning modeling and simulation for predicting PM2.5 concentrations[J]. Chemosphere, 2022, 308, Part1: 136353. [13] 康俊锋, 黄烈星, 张春艳, 等. 多机器学习模型下逐小时PM2.5预测及对比分析[J]. 中国环境科学, 2020, 40(5): 1 895-1 905. [14] 李娟, 尉鹏, 戴学之, 等. 基于机器学习方法的西安市数值模拟优化研究[J]. 环境科学研究, 2021, 34(4): 872-881. [15] 肖宇. 基于多机器学习算法耦合的空气质量数值预报订正方法研究[J] 环境科学研究, 2022, 35(12): 2 693-2 701. [16] 李曼, 张载勇, 李淑娟, 等. CUACE系统在乌鲁木齐空气质量预报中的效果检验[J]. 沙漠与绿洲气象, 2014, 8(5): 63-68. [17] 杨关盈, 邓学良, 吴必文, 等. 基于CUACE模式的合肥地区空气质量预报效果检验[J]. 气象与环境学报, 2017, 33(1): 51-57. [18] 高星星, 王楠, 张黎, 等. 汾渭平原CUACE模式空气质量预报性能的检验订正及环境评估[J]. 沙漠与绿洲气象, 2023, 17(1): 160-170. [19] 何金梅, 刘抗, 王玉红, 等. CUACE模式在兰州城市空气质量预报中的检验订正[J]. 干旱气象, 2017, 35(3): 495-501. [20] DIAZ-URIARTE R, ALVAREZ de ANDRÉS S. Gene selection and classification of microarray data using random forest[J]. BMC Bioinformatics, 2006, 7(3). http://doi.org/10.1186/1471-2105-7-3. [21] 孙权德, 焦瑞莉, 夏江江, 等. 基于机器学习的数值天气预报风速订正研究[J]. 气象, 2019, 45(3): 426-436. [22] 刘淑贤, 张立生, 刘扬, 等. 基于机器学习的热带气旋灾害等级评估模型构建及其活动特征分析[J]. 气象, 2024, 50(3): 331-343. [23] KE G L, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc. [24] 周康辉, 郑永光, 王婷波. 利用深度学习融合NWP和多源观测数据的闪电落区短时预报方法[J]. 气象学报, 2021, 79(1): 1-14. [25] 李恬, 王宏, 赵天良, 等. 山东一次PM2.5污染过程的模拟特征[J]. 气候与环境研究. 2016, 21(3): 313-322. -