SURVEY ON APPLICATION OF CORRELATION ANALYSIS OF DIFFERENT CALCULATION FORMS IN METEOROLOGY
-
摘要: 相关分析在气象科研和业务中具有广泛应用,是定量分析气象变量之间关系的重要工具。首先系统综述了气象科研与业务中不同计算形式的相关分析算法,然后重点阐述了相关分析在气象领域应用的最新进展,特别是全窗口滑动相关和相关系数的扫描式多尺度突变检测算法。接着介绍了大数据研究领域中新发展的相关分析算法,简析了其对气象相关分析的启示。最后分析了气象相关分析中存在的问题,并且对相关分析在气象领域未来的发展趋势进行了展望。Abstract: Correlation analysis is widely used in meteorological research and practice, and it is an important tool for the quantitative analysis of the relationship between meteorological variables. In the present study, correlation analysis algorithms of different calculation forms used in meteorological research and practice are summarized. Then, the latest development of using correlation analysis in meteorology is introduced, especially the full sliding window correlation and correlation coefficient scanning multi-scale change points detection algorithm developed by us. In addition, it introduces the newly developed correlation analysis algorithm in big data research and analyzes its use for meteorological correlation analysis. Finally, the existing problems in meteorological correlation analysis are analyzed, and the trend of correlation analysis in meteorology is presented.
-
图 1 子图a,b,c,d的每个水平变量(x)都有相同的平均值9.0和标准差11.0,每个垂直变量(y)也都有相同的平均值7.5和标准差为4.12,对这4个子图来说,皮尔逊(普通)相关系数是相同的,都为rxy=0.816[2]
图 2 两个变量X和Y的散点图[23]
图 3 1872—2010年期间冬季北半球雪盖指数(SCFN)与北极涛动指数(IAO)(a)和冬季SCFN与西伯利亚高压指数(ISH)之间(b)的全窗口相关系数
通过0.05显著性检验的相关系数用填色等值线图表示[35]。
图 4 标准化的AO和BMI(波罗的海最大年海冰面积)时间序列之间的小波相关
相对于红噪音的5%显著性用粗等值线显示。所有的显著性区域显示反位相行为。相对的位相关系用箭头显示(同位相指向右,反位相指向左,BMI超前AO 90 o指向正下,BMI滞后AO 90 o指向正上)[36]。
图 5 a. 马口站月平均流量正态化指数NSI和西江流域月降水量正态化指数NPI之间相关系数的多尺度突变扫描式U检验结果的等值线;b. NSI(粉色虚线)和NPI(绿色虚线)13点高斯滤波低通曲线
两序列样本中分段子样本相关系数突变点用黑色垂直粗线表示和分时段子样本的相关系数用黑色水平粗线表示[40]。
表 1 列联表的一般形式
Y|X Y1 Y2 ……… Ys 合计 X1 n11 n12 ……… n1s n1· X2 n21 n22 ……… n2s n2· ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ Xk nk1 nk2 ……… nks nk· 合计 n·1 n·2 ……… n·s n 表 2 性别与预报评分高低相关分析的2 × 2列联表
Y|X Y1/预报评分偏高 Y2/预报评分偏低 合计 X1/男预报员 n11=70 n12=30 n1·=100 X2/女预报员 n21=25 n22=75 n2·=100 合计 n.1=95 n.2=105 n=200 -
[1] WILKS D S. Statistical Methods in the Atmospheric Sciences(3nd edn)[M]. Amsterdam: Academic Press, 2011. [2] 朱玉祥, 李宏毅, 吕行, 译著. 大气科学中的统计方法(第三版)[M]. 北京: 气象出版社, 2017. [3] 韦青, 李伟, 彭颂, 等. 国家级天气预报检验分析系统建设与应用[J]. 应用气象学报, 2019, 30(2): 245-256. [4] 张自银, 赵秀娟, 熊亚军, 等. 基于动态统计预报方法的京津冀雾霾中期预报试验[J]. 应用气象学报, 2018, 29(1): 57-69. [5] 胡波, 俞燎霓, 滕代高. 高斯过程回归方法在浙江沿海海盗冬季阵风预报中的应用试验[J]. 热带气象学报, 2019, 35(6): 767-779. [6] HAM Y G, KIM J H, LUO J J. Deep learning for multi-year ENSO Forecasts[J]. Nature, 2019, 573: 568-572. [7] 王欢, 李栋梁. 气候变暖背景下全球海温对中国东部夏季降水年代际转折的影响[J]. 热带气象学报, 2019, 35(3): 398-408. [8] VON STORCH H, ZWIERS F W. Statistical analysis in climate research[M]. Cambridge: Cambridge University Press, 1999. [9] WANG X, SONG L C, WANG G F, et al. Operational climate prediction in the era of big data in China: Reviews and prospects[J]. J Meteor Res, 2016, 30(3): 444-456. [10] 魏凤英. 现代气候统计诊断与预测技术[M]. 北京: 气象出版社, 2007. [11] 孙颖, 尹红, 田沁花, 等. 全球和中国区域近50年气候变化检测归因研究进展[J]. 气候变化研究进展, 2013, 9(4): 235-245. [12] SUN Y, ZHANG X B, ZWIERS F W, et al. Rapid increase in the risk of extreme summer heat in eastern China[J]. Nature Climate Change, 2014, 4: 1082-1085. [13] 吴门新, 庄立伟, 侯英雨, 等. 中国农业气象业务系统(CAgMSS)设计与实现[J]. 应用气象学报, 2019, 30(5): 513-527. [14] 杨承睿, 任芳, 马楠. 试论大数据在气象服务中的应用[J]. 农业网络信息, 2016(8): 53-55. [15] 黄嘉佑. 气象统计分析与预报方法(第四版)[M]. 北京: 气象出版杜, 2016. [16] 丁一汇, 李霄, 李巧萍. 气候变暖背景下中国地面风速变化研究进展[J]. 应用气象学报, 2020, 31(1): 1-12. [17] GALTO N F. Regressio n towards mediocrity in hereditary stature[J]. J Anthropol Inst, 1886, 15: 246-263. [18] 许立言, 武炳义. 欧亚大陆春季融雪量与东亚夏季风的可能联系[J]. 大气科学, 2012, 36(6): 1 180-1 190. [19] WALLACE J M, GUTZLER D S. Teleconnections in the geopotential height field during the northern hemisphere winter[J]. Mon Wea Rev, 1981, 109: 784-812. [20] 丁一汇, 刘芸芸. 亚洲-太平洋季风区的遥相关研究[J]. 气象学报, 2008, 66(5): 670-682. [21] GOODMAM L A, KRUSKAL W H. Measure of association for cross classifications, Ⅳ: Simplification of asymptotic variances[J]. Journal of the American Statistical Association, 1972, 67(338): 415-421. [22] SPEARMAN C. The proof and measurement of association between two things[J]. The American Journal of Psychology, 1904, 15(1): 72-101. [23] 樊嵘, 孟大志, 徐大舜. 统计相关分析研究进展[J]. 数学建模及其应用, 2014, 3(1): 1-12. [24] KENDALL M G. A new measure of rank correlation[J]. Biometrika, 1938, 30(1/2): 81-39. [25] GOODMAM L A, KRUSKAL W H. Measure of association for cross classifications[J]. Journal of the American Statistical Association, 1954, 49(268): 732-764. [26] GOODMAM L A, KRUSKAL W H. Measure of association for cross classifications, Ⅱ: Further discussion and references[J]. Journal of the American Statistical Association, 1959, 54(285): 123-163. [27] 黄嘉佑, 李庆祥. 气象数据统计分析方法[M]. 北京: 气象出版社, 2015. [28] 施能. 气象科研与预报中的多元分析方法[M]. 北京: 气象出版社, 2002. [29] 施能. 气象统计预报[M]. 北京: 气象出版社, 2009. [30] BRETHERTON C S, SMITH C, WALLACE J M. An intercomparison of methods for finding coupled patterns in climate data[J]. J Climate, 1992, 5: 541-560. [31] WALLACE J M, SMITH C, BRETHERTON C S. Singular value decomposition of sea-surface temperature and 500 mb heights anormalies[J]. J Climate, 1992, 5: 561-576. [32] 史加荣, 杨柳. 基于奇异值分解的气象数据推测[J]. 气象学报, 2020, 78(1): 128-142. [33] DING Q H, WANG B. Circumglobal teleconnection in the Northern Hemisphere summer[J]. J Climate, 2005, 18: 3 483-3 505. [34] DING Q H, WANG B. Intraseasonal teleconnection between the summer Eurasian wave train and the Indian monsoon[J]. J Climate, 2007, 20: 3 751-3 767. [35] ZHAO L, ZHU Y X, LIU H W, et al. A stable snow-atmosphere coupled mode[J]. Climate Dyn, 2016, 47: 2 085-2 104. [36] Grinsted A, Moore J, Jevrejeva S. Application of the cross wavelet transform and wavelet coherence to geophysical time series[J]. Nonlin Process Geophys, 2004, 11: 561-566. [37] TORRENCE C, COMPO G P. A practical guide to wavelet analysis[J]. Bull Am Meteorol Soc, 1998, 79: 61-78. [38] 江剑民. 多尺度突变现象的扫描式t检验方法及其相干性分析[J]. 地球物理学报, 2001, 44(1): 31-39. [39] JIANG J M. Scanning detections of multi-scale significant change-points in subseries means, variances, trends and correlations[C]. Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009, 285: 609-613. [40] ZHU Y X, JIANG J M, HUANG C X, et al. Applications of multiscale change-point detections to monthly streamflow and rainfall in Xijiang River in Southern China, Part I: correlation and variance[J]. Theor and Appl Climatal, 2019, 136: 237-248. [41] 张巍. 大数据在气象服务中的研究与应用[J]. 计算机光盘软件与应用, 2014(18): 27-28. [42] 赵蓓. 大数据时代对气象服务的推动[J]. 考试周刊, 2014(33): 195-196. [43] FAGHMOUS J, KUMAR V. A big data guide to understanding climate change: The case for theory-guided data science[J]. Big Data, 2014, 2, 155-163. [44] 崔巍. 大数据在气象服务中的应用与分析[J]. 低碳技术, 2016(9): 121-122. [45] 嵇磊, 王在文, 陈敏, 等. 人工智能技术能否提高地面气温预报的精度——记AI Challenger 2018全球天气预报挑战赛[J]. 气象学报, 2019, 77(5): 960-964. [46] 梁吉业, 冯晨娇, 宋鹏. 大数据相关分析综述[J]. 计算机学报, 2016, 39(1): 1-18. [47] 盛阳燕, 周涛译. 大数据时代[M]. 杭州: 浙江人民出版社, 2013. [48] LU L Y, MEDO M, YEUNG C H, et al. Recommender systems[J]. Physics Reports, 2012: 519: 1-49. [49] LU X, BENGTSSON L, HOLME P. Predictability of population displacement after the 2010 Haiti earthquake[C]. Proceedings of the National Academy of Sciences of the United States of America, 2012, 109(29): 11 576-11 581. [50] HANNACHI A, JOLLIFFE I T, STEPHENSON D B. Empirical orthogonal functions and related techniques in atmospheric science: A review[J]. Int J Climatol, 2007, 27(9): 1 119-1 152. [51] 李颖, 陈怀亮. 机器学习技术在现代农业气象中的应用[J]. 应用气象学报, 2020, 31(3): 257-266. [52] 吴洪宝, 吴蕾. 气候变率诊断和预测方法[M]. 北京: 气象出版社, 2005. [53] SZEKELY G J, RIZZO M LK, BAKIROV N K. Measuring and testing dependence by correlation of distances[J]. The Annals of Statistics, 2007, 35(6): 2 769-2 794. [54] RACHERLA P N, SHINDELL D T, FALUVEGI G S. The added value to global model projections of climate change by dynamical downscaling: A case study over the continental U. S. using the GISS-ModelE2 and WRF models[J]. J Geophy Res, 2012, 1117(D20118): 1-8. [55] RESHEF D N, RESHEF Y A, FINUCANE H K, et al. Detecting novel associations in large data sets[J]. Science, 2011, 334: 1 518-1 524. [56] 杨静, 李文平, 张建沛. 大数据典型相关分析的云模型方法[J]. 通信学报, 2013, 34(10): 121-134. [57] NGUYEN H V, MULLER E, BOM K. A near-linear time subspace search scheme for unsupervised selection of correlation features[J]. Big Data Research, 2014, 1: 37-51. [58] 施能, 魏凤英, 封国林. 气象场相关分析及合成分析中的梦特卡洛检验[J]. 南京气象学院学报, 1997, 20(3): 355-359. [59] ZWIERS F W, VON STORCH H. On the role of statistics in climate research[J]. Int J Climatol, 2004, 24: 665-680. [60] 黄嘉佑. 气象中使用统计检验的几个问题[J]. 气象, 2005, 31(7): 3-5. [61] TALEB N N. Fooled by Randomness[M]. Texere: New York, 2001. [62] LIVEZEY R E, CHEN W Y. Statistical field significance and its determination by Monte Carlo techniques[J]. Mon Wea Rev, 1983, 111(1): 46-59. [63] BAYLEY G V, HAMMERSLEY J M. The effective number of independent observations in an auto-correlated time series[J]. J Roy Statist Soc, 1946, 8(1B): 184-197. [64] AMBAUM M H. Significance tests in climate science[J]. J Climate, 2010, 23(22): 5 927-5 932. [65] 宋燕, 李智才, 肖子牛, 等. 太阳活动与高原积雪及东亚环流的年代际相关分析[J]. 高原气象, 2016, 35(5): 1 135-1 147. [66] 郭华东, 王力哲, 陈方, 等. 科学大数据与数字地球[J]. 科学通报, 2014, 59: 1 047-1 054. [67] 唐亘. 精通数据科学: 从线性回归到深度学习[M]. 北京: 人民邮电出版社, 2018. -