THE CORRELATION FACTORS OF THE CHANGE OF AMBIENT AIR NO2 CONCENTRATION: AN ANALYSIS BASED ON IMPROVED APRIORI ALGORITHM
-
摘要: 环境空气污染是一种复杂的非线性动态现象,受道路网络上的交通流、气象条件等因素的影响,定量揭示这些因素与空气污染物浓度的变化关系,是空气质量预测与改善的重要基础。以广东省佛山市南海区气象局空气质量监测站点周边1.5 km的半径区域为研究范围,利用改进后的关联规则算法对监测站点不同方位的道路交通流,引入气象条件,定量分析对空气污染物NO2浓度的影响,并通过线性相关分析验证关联规则的结果。(1) 传统的Apriori算法计算效率低,得到的规则多为无效、不可靠。将原算法对数据库多次扫描,改变为对上一频繁K项集的元素进行扫描。并引入新的可靠性衡量指标“提升度”,及加入对关联规则结果的筛选过程。改进后的算法只需扫描数据库一次,新增的两类改进方法使Apriori算法计算效率得到提升,且增强了强关联规则挖掘的可靠性。(2) 从强关联规则得出:影响NO2浓度变化的主要因素是风速、温度和气压,风速和温度与NO2浓度的变化呈负相关,气压与NO2浓度变化呈正相关。(3) 引入道路交通流,结合气象因素对NO2的影响:道路交通流量大且扩散条件好,不会导致污染物迅速上升,具体表现为:当道路交通流量较大时,伴随气压较低、风速较大或者温度较高,NO2浓度处于低浓度等级,置信度较高(90%~100%);而道路交通流小且气象条件差,会导致污染物逐渐累积,具体表现为:道路交通流较小时,伴随气压较高、风速较低或气温较低,不利于NO2扩散,且量化的置信度存在一定的偏差;考虑风向条件时,道路位于上下风向,对NO2浓度的影响也不同。(4) 基于强关联规则识别的关键影响因素,与NO2进行线性拟合并计算皮尔逊相关系数,所得结果与关联规则算法的结论一致。通过以上结果,表明关联规则算法在挖掘定量关系具有较高的效率性和准确性,可为区域空气治理和预测提供技术支持。
-
关键词:
- 空气污染物NO2 /
- 关联因素 /
- Apriori算法改进 /
- 交通流和气象
Abstract: Ambient air pollution is a complex non-linear dynamic phenomenon. It is affected by factors such as the traffic flow on road networks and meteorological conditions. Quantitatively revealing the relationship between these factors and the concentration of air pollutants is an important basis for air quality prediction and improvement. This paper focuses on the 1.5 km road network around the air quality monitoring station at the Meteorological Bureau of Nanhai District, Foshan. It uses the improved association rule algorithm to monitor the road traffic flow in both directions on the road around the station, introduces meteorological conditions to quantitatively analyzes NO2 concentration, and verify the results of association rules through linear correlation analysis. The results show that: (1) The traditional Apriori algorithm has low computational efficiency, and the rules obtained are mostly invalid and unreliable. To improve the algorithm, the present study changes from scanning the database for multiple times to scanning the elements of the last frequent K item sets. It also introduces a new reliability measurement index "lift" and a process to screen the results of the association rules. The database only needs to be scanned once when the improved algorithm is used. The two new methods increase the calculation efficiency of the Apriori algorithm and enhance the reliability of strong association rule mining. (2) According to the strong association rule, the main factors affecting the change of NO2 concentration are wind speed, temperature, and air pressure. Wind speed and temperature are negatively correlated with changes in NO2 concentration, whereas air pressure is positively correlated with changes in NO2 concentration. (3) In the present research, road traffic flow is introduced as a parameter and combined with meteorological factors to assess their influence on NO2. It is found that when the road traffic flow is large, the accompanying air pressure is low, the wind speed is high or the temperature is high, the NO2 concentration will be at a low concentration level, and the confidence level will be high (90%~100%). When the road traffic flow is small, the pressure is high, the wind speed is low, or the temperature is low, the diffusion of NO2 will be affected, and there will be certain deviation in the quantified confidence level. As for wind direction, the road is located in the upwind and downwind directions, which have different effects on NO2 concentration. (4) Based on the key influencing factors identified by the strong association rule, linear fitting is performed with NO2 and the Pearson correlation coefficient is calculated. The results obtained are consistent with those of the association rule algorithm. The above results show that the association rule algorithm has high efficiency and accuracy in mining quantitative relationships and can provide technical support for regional air governance and forecasting. -
表 1 数据等级表
数据 等级1 等级2 等级3 等级4 海三路流量/辆 A1≤463 463<A2≤876 876<A3≤1 085 A4>1 085 佛平二路流量/辆 B1≤376 376<B2≤907 907<B3≤1 041 B4>1 041 南海大道流量/辆 C1≤386 386<C2≤773 773<C3≤12 01 C4>1 201 桂澜路流量/辆 D1≤531 531<D2≤1 436 143<D3≤1 827 D4>1 827 NO2/(μg/m3) N1≤20 20<N2≤29 29<N3≤45 N4>45 风向/° X1≤90 90<X2≤180 180<X3≤270 X4>270 风速/(m/s) W1≤1 1<W2≤2 2<W3 气温/℃ T1≤21.2 21.2<T2≤28 28<T3 湿度/% R1≤53 53<R2≤66 66<R3 气压/hPa P1≤1 006 1 006<P2≤1 015 P3>1 015 表 2 数据事务项集
序号 海三路 佛平二路 南海大道 桂澜路 风向 风速 气压 气温 湿度 NO2 1 A1 B2 C2 D2 X3 W1 P2 T2 R2 N2 2 A1 B1 C2 D1 X3 W1 P2 T2 R2 N2 …… …… …… …… …… …… …… …… …… …… …… 4 974 A3 B2 C2 D2 X2 W3 P3 T2 R1 N1 表 3 交通流、气象与NO2低浓度的强关联规则
编号 前项 后项 置信度 1 P1,X3 N1 1 2 T3,W3 N1 1 3 A4,D2,X3 N1 1 4 A4,P1,W2 N1 0.971 1 5 A4,B2,T3 N1 1 6 B1,W2,X3 N1 1 7 C2,P1,R1 N1 1 8 D1,T3,X3 N1 1 9 D1,W2,X3 N1 1 10 D4,P1,W2 N1 0.903 4 11 D4,P1,R1 N1 0.906 3 12 D4,T3,W2 N1 0.905 7 13 D4,T3,X3 N1 0.909 4 表 4 交通流、气象与NO2中高浓度的强关联规则
编号 前项 后项 置信度 1 C1,P2,T3 N2 1 2 C4,D2,W3 N2 1 3 A3,C4,T1,X1 N2 1 4 A3,C2,W3,X1 N2 0.909 1 5 A4,B2,D4,T3 N2 1 6 A4,P2,T1,W3 N2 1 7 A3,P3,R3,W2 N3 0.938 4 8 A4,C3,P3,R3 N3 0.966 1 9 A1,C2,D3,X2 N3 1 10 B3,P3,R3,W2 N3 0.875 0 11 C2,P3,R3,W2 N3 0.962 3 12 D3,P3,R2,W3 N3 0.887 4 13 D3,R3,W2,X4 N3 1 表 5 交通流、气象与NO2高浓度的强关联规则
编号 前项 后项 置信度 1 D1,R1,W1 N4 1 2 A2,B2,C1,W1 N4 1 3 A2,B1,R3,T1 N4 0.906 9 4 A2,C1,T1,W1 N4 1 5 A2,C1,P2,T1 N4 0.887 2 6 A2,C4,T1,W1 N4 1 7 A2,D4,P3,W1 N4 1 8 A2,D4,T1,W1 N4 0.909 1 9 A3,P2,T1,W1 N4 1 10 A3,D2,T1,W1 N4 0.909 7 11 A3,P3,R1,W1 N4 1 12 B1,D1,R1,W1 N4 0.907 4 13 B2,R1,T1,X2 N4 0.909 1 14 B4,T1,W1,X4 N4 0.916 7 15 B3,D2,T1,X3 N4 0.916 6 16 B3,T1,W1,X3 N4 0.909 1 17 C4,T1,W1,X4 N4 1 18 C1,D2,P3,W1 N4 1 19 C1,D2,T1,W1 N4 1 20 D3,P2,T1,W1 N4 1 21 D4,T1,W1,X4 N4 1 -
[1] 2020年中国移动源环境管理年报——第Ⅰ部分机动车排放情况[J]. 环境保护, 2020, 48(16): 47-50. [2] 李光强, 李晶晶, 邓敏. 空气质量与气象因子间关联规则的挖掘方法研究[J]. 环境科学与技术, 2008, 31(12): 1-3、16. [3] CHEN G, LI S, KNIBBS L D, et al. A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information[J]. Science of The Total Environment, 2018, 636: 52-60. [4] SALVA J, VANEK M, SCHWARZ M, et al. An assessment of the on-road mobile sources contribution to particulate matter air pollution by AERMOD dispersion model[J]. Sustainability, 2021, 13(22): 12 748. [5] SAIDE P E, CARMICHAEL G R, SPAK S N, et al. Forecasting urban PM10 and PM2.5 Pollution episodes in very stable nocturnal conditions and complex terrain using WRF-Chem CO tracer model[J]. Atmos Environ, 2011, 45(16): 2 769-2 780. [6] CHEN J, LU J, AVISE J C, et al. Seasonal Modeling of PM2.5, in California's San Joaquin Valley[J]. Atmos Environ, 2014, 92: 182-190. [7] 吴亦政, 张乐琦. 面向交通环境影响评价的大气扩散模型应用策略[J]. 同济大学学报(自然科学版), 2020, 48(11): 1 612-1 619. [8] 刘永红, 谢敏, 蔡铭, 等. 基于BP神经网络的佛山空气质量预报模型的研究[J]. 安全与环境学报, 2011, 11(2): 125-130. [9] SHI K, DI B, ZHANG K, et al. Detrended cross-correlation analysis of urban traffic congestion and NO2 concentrations in Chengdu[J]. Transportation Research Part D Transport & Environment, 2017, 61D(PT. A): 165-173. [10] VIENNEAU D, BRIGGS D J. Delimiting affinity zones as a basis for air pollution mapping in Europe[J]. Environment International, 2013, 51(1): 106-115. [11] 张丹. 北京市空气质量与机动车尾气排放量关系研究[D]. 北京: 北京交通大学, 2017. [12] 王宏, 郑秋萍, 温珍治, 等. ENSO循环对福建省近地层臭氧浓度变化的影响[J]. 热带气象学报, 2021, 37(2): 145-153. [13] 蒲义良, 吴斯敏, 叶朗明, 等. 江门市城区臭氧浓度变化特征及气象影响因素分析[J]. 热带气象学报, 2020, 36(5): 650-659. [14] 关宏志, 曹奇, 赵磊. PM2.5质量浓度变化特征与交通流的关系——以北京市为例[J]. 北京工业大学学报, 2016, 42(9): 1372-1 378. [15] 霍颖惠. 基于时空特征分析的环境污染物浓度预测模型研究[D]. 北京: 北京交通大学, 2020. [16] CWA B, SL A, XY A, et al. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction[J]. Science of The Total Environment, 2019, 654: 1 091-1 099. [17] 杨张婧, 阎威武, 王国良, 等. 基于大数据的城市空气质量时空预测模型[J]. 控制工程, 2020, 27(11): 1 859-1 866. [18] KARATZA K D, KAITSATOS S. Air pollution modelling with the aid of computational intelligence methods in Thessaloniki, Greece[J]. Simulation Modelling Practice & Theory, 2007, 15(10): 1 310-1 319. [19] SFETSOS A, VLACHOGIANNIS D. A new approach to discovering the causal relationship between meteorological patterns and PM10 exceedances[J]. Atmos Res, 2010, 98(2): 500-511. [20] LI Z, LI X, TANG R, et al. Apriori algorithm for the data mining of global cyberspace security Issues for human participatory based on association rules [J]. Frontiers in Psychology, 2021, 11(5): 10.3389/fpsyg.2020.582480. [21] YU H. Apriori algorithm optimization based on Spark platform under big data[J]. Microprocessors and Microsystems, 2021, 80(11): 103528. [22] XIE D F, WANG M H, ZHAO X M. A Spatiotemporal Apriori Approach to Capture Dynamic Associations of Regional Traffic Congestion[J]. IEEE Access, 2019, 8(8): 3 695-3 709. [23] 阮文就, 储江伟. 基于数据挖掘的汽车排放检测结果与使用特点关联性分析[J]. 交通节能与环保, 2019, 15(5): 16-20. [24] 刘文祎. 关联规则算法优化及基于Spark的并行化研究[D]. 兰州: 兰州交通大学, 2019. [25] 宋帛洋. 基于数据挖掘的天津市PM2.5预测[D]. 天津: 天津大学, 2018. [26] HU J, XU X. Research on real-time network data mining technology for big data[J]. EURASIP Journal on Wireless Communications and Networking, 2019, 1: 1-6 [27] MGUIRIS I, AMDOUNI H, GAMMOUDI M. An algorithm for fuzzy association rules extraction based on prime number coding[C]. IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises. 2017 [28] 江山, 宋柯, 谢维成, 等. 基于灰色关联与Apriori算法的道路交通事故数据分析[J]. 公路工程, 2019, 44(4): 67-73. [29] WU X D, ZENG Y Z. Using apriori algorithm on students'performance data for association rules mining[P]. Proceedings of the 2nd International Seminar on Education Research and Social Science (ISERSS 2019), 2019. [30] ZENG M, XIONG Q, LI K. Design and implementation of an improved apriori data mining algorithm[P]. Proceedings of the 8th International Conference on Social Network, Communication and Education (SNCE 2018). [31] CHOK H, GRUENWALD L. An online spatio-temporal association rule mining framework for analyzing and estimating sensor data[C]. International Database Engineering & Applications Symposium. DBLP, 2009: 217. [32] HUI Y, PARTHASARATHY S. Mining spatial and spatio-temporal patterns in scientific data[C]. International Conference on Data Engineering Workshops. IEEE, 2008. [33] 钱怡欣. 基于Apriori算法和证据推理的大气环境关键规则挖掘研究[D]. 北京: 北京交通大学, 2019. [34] DJENOURI Y, COMUZZI M. Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem[J]. Information Sciences, 2017, 420: 1-15.