Assessment of Direct Economic Losses from Tropical Cyclones Based on Explainable Artificial Intelligence (XAI)
-
摘要: 可解释人工智能(eXplainable Artificial Intelligence,XAI)已经成为人工智能研究领域的重要发展方向,该技术可以帮助解释模型如何做出预测和决策,在气象灾害评估领域具有较大应用价值。本研究旨在利用机器学习算法评估热带气旋(Tropical Cyclone,TC)的直接经济损失,并采用XAI方法SHAP(SHapley Additive exPlanations),从全局和局部层面分析特征因素对模型预测的影响和贡献。结果表明,随机森林(Random Forest, RF)模型在均方根误差、平均绝对误差和决定系数这三个评估指标中均优于LightGBM(Light Gradient Boosting Machine)模型,指标值分别达到了23.6、11.1和0.9。根据SHAP值,RF模型中最重要的三个因素分别是极大风速、最大日雨量和暴雨站点比例。具体而言,当样本的极大风速值大于45 m ·s-1、最大日雨量值超过250 mm以及暴雨站点比例高于30%时,往往对TC直接经济损失预测值产生较大的正贡献。该研究可以为决策者制定灾害风险管理策略提供有力的科学依据和理论支持。Abstract: Explainable artificial intelligence (XAI) is increasingly recognized as a prominent development direction in the field of artificial intelligence, both in research and practical applications. This technology is actively employed to clarify how models arrive at predictions and decisions, and it holds significant value in the assessment of meteorological disasters. Within this context, this study aimed to utilize machine learning algorithms to evaluate the direct economic losses resulting from tropical cyclones (TC). Additionally, it employed XAI methods, specifically Shapley additive explanations (SHAP), to analyze the influence and contribution of feature variables on model predictions from global and local perspectives. The findings of this study consistently demonstrate that the random forest (RF) model outperformed the LightGBM model in predicting economic losses from TCs. Compared to LightGBM, the RF model achieved lower values for root mean square error (RMSE) at 23.6, mean absolute error (MAE) at 11.1, and a higher coefficient of determination (R2) at 0.9. Upon closer examination of the contribution analysis concerning feature variables, it becomes evident that hazard factor indicators played a more prominent role in predicting TC economic losses than exposure and vulnerability indicators, along with disaster risk reduction capacity indicators. Specifically, the top three contributors were identified as maximum wind speed (H3), maximum daily rainfall (H1), and the proportion of rainfall stations (H2). Among these, maximum wind speed (H3) stood out with a notably higher contribution than other indicators, signifying its pivotal importance in assessing economic losses from TCs. In a more specific context, instances where the maximum wind speed (H3) exceeded 45 m · s-1, maximum daily rainfall (H1) surpassed 250 mm, and the proportion of rainfall stations (H2) exceeded 30%, were observed to significantly enhance the accuracy of TC-induced economic loss predictions, as indicated by their significantly higher SHAP values. Overall, the advancements in XAI, combined with the effective application of ML algorithms, rendered invaluable insights into accurately assessing economic losses resulting from tropical cyclones. These insights are instrumental in informing decision-makers and policy planners in developing effective disaster risk management strategies.
-
表 1 RF模型的参数调整范围及最优参数组合
参数 含义及调整范围 最优参数取值 min_samples_split 分裂内部节点所需的最少样本数[2, 5, 7, 9, 10] 7 max_depth 决策树的最大深度[3, 4, 5, 6, 7] 5 min_samples_leaf 叶子节点上所需的最少样本数[1, 3, 5, 7, 9] 5 max_features 寻找最佳分裂时要考虑的特征数量[2, 4, 6, 8, 10] 6 n_estimators 提升迭代次数[10, 50, 100, 150, 200] 150 表 2 基于不同ML模型的TC直接经济损失预测结果评估
模型 RMSE MAE R² LightGBM 38.3 17.9 0.7 RF 23.6 11.1 0.9 SVR 74 29.3 0.3 -
[1] 牛海燕, 刘敏, 陆敏, 等. 中国沿海地区近20年台风灾害风险评价[J]. 地理科学, 2011, 31(6): 764-768. [2] 张娇艳, 吴立广, 张强. 全球变暖背景下我国热带气旋灾害趋势分析[J]. 热带气象学报, 2011, 27(4): 442-454. [3] 魏章进, 马华铃, 唐丹玲. 基于改进熵值法的台风灾害风险趋势评估[J]. 灾害学, 2017, 32(3): 7-11. [4] LI Y, ZHAO S, WANG G. Spatiotemporal variations in meteorological disasters and vulnerability in China during 2001-2020[J]. Front Earth Sci, 2021, 9: 789523. [5] KNUTSON T, CAMARGO S J, CHAN J, et al. Tropical cyclones and climate change assessment: part ii: projected response to anthropogenic warming[J]. Bull Amer Meteor Soc, 2020, 101(3): 303-322. [6] 丑洁明, 董文杰, 徐洪, 等. 影响中国沿海区域的热带气旋及其经济损失评估[J]. 气象与环境科学, 2022, 45(3): 1-10. [7] SCHMIDT S, KEMFERT C, HOEPPE P. The impact of socio-economics and climate change on tropical cyclone losses in the USA[J]. Reg Environ Chang, 2010, 10: 13-26. [8] YE M, WU J, LIU W, et al. Dependence of tropical cyclone damage on maximum wind speed and socioeconomic factors[J]. Environ Res Lett, 2020, 15: 094061. [9] AN S, WANG J, WEI J. Local-Nearest-Neighbors-Based Feature Weighting for Gene Selection[J]. IEEE / ACM Tran Comput Biol Bioinform, 2017, 15(5): 1 538-1 548. [10] AN S, WANG J, WEI J, et al. Unsupervised Feature Selection with Joint Clustering Analysis[C]//In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6-10 November 2017: 1 639-1 648. [11] 冯倩, 刘强. 基于SVM-BP神经网络的风暴潮灾害损失预评估[J]. 海洋环境科学, 2017, 36(4): 615-621. [12] ZHANG H, SONG Y, XU S, et al. Combining a class-weighted algorithm and machine learning models in landslide susceptibility mapping: A case study of Wanzhou section of the Three Gorges Reservoir, China[J]. Computers Geosci, 2022, 158: 104966. [13] 杨绚, 张立生, 王铸. 基于机器学习算法的县域台风灾害经济损失风险评估[J]. 热带气象学报, 2022, 38(5): 651-661. [14] 刘扬, 王维国. 基于随机森林的暴雨灾害人口损失预估模型及应用[J]. 气象, 2020, 46(3): 393-402. [15] ZHANG Y, GE T, TIAN W, et al. Debris flow susceptibility mapping using machine-learning techniques in Shigatse Area, China[J]. Remote Sensing, 2019, 11(23): 2801. [16] CHAKRABORTY D, BASAGAOGLU H, WINTERLE J. Interpretable vs. noninterpretable machine learning models for data-driven hydroclimatological process modelling[J]. Expert Syst Appl, 2021, 170: 114498. [17] 李扬, 刘玉宝, 许小峰. 基于深度学习改进数值天气预报模式和预报的研究及挑战[J]. 气象科技进展, 2021, 11(3): 103-112. [18] 董润婷, 吴利, 王晓英, 等. 深度学习在天气预报领域的应用分析及研究进展综述[J]. 计算机应用, 2023, 43(6): 1 958-1 968. [19] SHAPLEY L S. Stochastic games[J]. Proceedings of the National Academy of Sciences, 1953, 39(10): 1 095-1 100. [20] RIBEIRO M T, SINGH S, GUESTRIN C. "Why should I trust you?"Explaining the predictions of any classifier[C]//In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13-17 August 2016: 1 135-1 144. [21] FELSCHE E, LUDWI R. Applying machine learning for drought prediction in a perfect model framework using data from a large ensemble of climate simulations[J]. Nat Hazards Earth Syst Sci, 2021, 21(12): 3 679-3 691. [22] AYDIN H E, IBAN M C. Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive ExPlanations[J]. Nat Hazards, 2023, 116: 2 957-2 991. [23] IBAN M C, BILGILIOGLU S S. Snow avalanche susceptibility mapping using novel tree-based machine learning algorithms (XGBoost, NGBoost, and LightGBM) with eXplainable Artificial Intelligence (XAI) approach[J]. Stoch Environ Res Risk Assess, 2023, 37: 2 243-2 270. [24] 中国气象局. 中国气象灾害年鉴[M]. 北京: 气象出版社, 2000-2020. [25] 申明尧, 韩萌, 杜诗语, 等. 数据流决策树集成分类算法综述[J]. 计算机应用与软件, 2022, 9(39): 1-10. [26] MOHAMMADIFAR A, GHOLAMI H, COMINO J R, et al. Assessment of the interpretability of data mining for the spatial modelling of water erosion using game theory[J]. CATENA, 2021, 200: 105178. [27] IBAN M C, SEKERTEKIN A. Machine learning based wildfire susceptibility mapping using remotely sensed fire data and GIS: A case study of Adana and Mersin provinces[J]. Turk Ecol Inf, 2022, 69: 101647. [28] ZHOU X, WEN H, LI Z, et al. An interpretable model for the susceptibility of rainfall-induced shallow landslides based on SHAP and XGBoost[J]. Geocarto Int, 2022, 37: 13 419-13 450. [29] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C]//In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4-9 December 2017: 4 765-4 774. -