摘要
南海热流数据是研究南海形成与演化的重要基础数据.由于受到海洋调查条件等多种因素的限制,南海热流站位表现出空间站位不均衡的特点.鉴于热流数据的重要性,为弥补部分区域的热流,我们尝试使用支持向量机、梯度提升回归树、极端梯度提升树和随机森林四种机器学习算法对南海区域的热流进行预测.我们共收集到958个热流数据,并在数据预处理阶段排除了14个异常值.然后,将944个热流实测值与十种地质和地球物理特征组合为数据集,并将其随机分为训练集(占比80%)和测试集(占比20%).为了量化预测性能,我们使用了三个度量标准来评估此次机器学习算法的性能.根据模型的评估指标,我们发现:随机森林算法和梯度提升回归树算法表现较为出色,其实际测量值与预测值的决定系数R2分别为64%和60%,其相对误差(以归一化均方根误差表示)分别为0.092和0.097;我们将四种机器学习算法结合使用时,发现混合模型要比部分单一模型预测的准确度高;最终,我们基于上述模型,成功绘制出了新的南海热流图,相较于传统的克里金插值,机器学习预测结果呈现出更详细的特征,并且可以为热流观测缺失的区域提供参考信息.
Heat flow data in the South China Sea is an important basic data for studying the formation and evolution of the South China Sea.Due to the limitations of various factors such as oceanographic survey conditions,the heat flow stations in the South China Sea show the characteristics of uneven spatial stations.Given the importance of the heat flow data,to compensate for the heat flow in some regions,we try to predict the heat flow in the South China Sea region using four machine learning algorithms,namely,Support Vector Machine,Gradient Boosted Regression Tree,Extreme Gradient Boosted Tree and Random Forest.We collected a total of 958 heat flow data and excluded 14 outliers in the data preprocessing stage.Then,the 944 measured heat flow values were combined with ten geological and geophysical features into a dataset,which was randomly divided into a training set(80%of the total)and a test set(20%of the total).To quantify the prediction performance,we used three metrics to evaluate the performance of this machine learning algorithm.According to the evaluation metrics of the models,we found that:the Random Forest algorithm and the Gradient Boosted Regression Tree algorithm performed better,with a coefficient of determination R2 of 64%and 60%between the actual measured and predicted values,and their relative errors(expressed as the normalized root mean square error)were 0.092 and 0.097,respectively;when we used a combination of the four machine learning algorithms,we found that the hybrid model was more accurate than some of the single model prediction accuracy;finally,we successfully mapped the new South China Sea heat flow map based on the above models,and the machine learning prediction results present more detailed features compared to the traditional kriging interpolation,and can provide reference information for the regions where heat flow observations are missing.
作者
秦雪
许鹤华
李泯
邵佳
姚永坚
QIN Xue;XU HeHua;LI Min;SHAO Jia;YAO YongJian(Key Laboratory of Ocean and Marginal Sea Geology,South China Sea Institute of Oceanology,Innovation Academy of South China Sea Ecology and Environmental Engineering,Chinese Academy of Sciences,Guangzhou 511458,China;University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Aerospace Information Innovation,Chinese Academy of Sciences,Beijing 100094,China;Department of Marine Science and Engineering,Southern University of Science and Technology,Shenzhen 518055,China;Key Laboratory of Submarine Mineral Resources,Ministry of Natural Resources,Guangzhou Marine Geological Survey,Guangzhou 510760,China)
出处
《地球物理学报》
北大核心
2025年第8期3189-3206,共18页
Chinese Journal of Geophysics
基金
NSFC-广东联合基金项目(No.U20A20100)
广州市南沙区科技局重点项目(2023ZD018)资助。
关键词
南海
热流
支持向量机
随机森林
机器学习
South China Sea
heat flow
Support vector machine
Random forest
Machine learning