Selection of negative samples significantly influences landslide susceptibility assessment,especially when establishing the relationship between landslides and environmental factors in regions with complex geological ...Selection of negative samples significantly influences landslide susceptibility assessment,especially when establishing the relationship between landslides and environmental factors in regions with complex geological conditions.Traditional sampling strategies commonly used in landslide susceptibility models can lead to a misrepresentation of the distribution of negative samples,causing a deviation from actual geological conditions.This,in turn,negatively affects the discriminative ability and generalization performance of the models.To address this issue,we propose a novel approach for selecting negative samples to enhance the quality of machine learning models.We choose the Liangshan Yi Autonomous Prefecture,located in southwestern Sichuan,China,as the case study.This area,characterized by complex terrain,frequent tectonic activities,and steep slope erosion,experiences recurrent landslides,making it an ideal setting for validating our proposed method.We calculate the contribution values of environmental factors using the relief algorithm to construct the feature space,apply the Target Space Exteriorization Sampling(TSES)method to select negative samples,calculate landslide probability values by Random Forest(RF)modeling,and then create regional landslide susceptibility maps.We evaluate the performance of the RF model optimized by the Environmental Factor Selection-based TSES(EFSTSES)method using standard performance metrics.The results indicated that the model achieved an accuracy(ACC)of 0.962,precision(PRE)of 0.961,and an area under the curve(AUC)of 0.962.These findings demonstrate that the EFSTSES-based model effectively mitigates the negative sample imbalance issue,enhances the differentiation between landslide and non-landslide samples,and reduces misclassification,particularly in geologically complex areas.These improvements offer valuable insights for disaster prevention,land use planning,and risk mitigation strategies.展开更多
为了改善传统Relief算法适应性和鲁棒性差的缺陷,融合间距最大化、信息熵和分类局部一致性,构造了新的间距最大化目标函数,并进一步对目标函数进行优化,得到一些新的理论结果。在此基础上提出了新的基于两类数据的Relief特征加权算法LIE...为了改善传统Relief算法适应性和鲁棒性差的缺陷,融合间距最大化、信息熵和分类局部一致性,构造了新的间距最大化目标函数,并进一步对目标函数进行优化,得到一些新的理论结果。在此基础上提出了新的基于两类数据的Relief特征加权算法LIE-Relief-T(Local consistency information entropy Relief algorithm based twoclass data),并将其扩展到多类数据的特征加权算法LIE-Relief-MLocal consistency information entropy Relief algorithm based multi-class data)。利用UCI和基因表达数据集进行实验验证,结果表明该新的Relief特征加权算法分类错误率较低,对噪声和野点表现出了更好的适应性和鲁棒性。展开更多
基金supported by Natural Science Research Project of Anhui Educational Committee(2023AH030041)National Natural Science Foundation of China(42277136)Anhui Province Young and Middle-aged Teacher Training Action Project(DTR2023018).
文摘Selection of negative samples significantly influences landslide susceptibility assessment,especially when establishing the relationship between landslides and environmental factors in regions with complex geological conditions.Traditional sampling strategies commonly used in landslide susceptibility models can lead to a misrepresentation of the distribution of negative samples,causing a deviation from actual geological conditions.This,in turn,negatively affects the discriminative ability and generalization performance of the models.To address this issue,we propose a novel approach for selecting negative samples to enhance the quality of machine learning models.We choose the Liangshan Yi Autonomous Prefecture,located in southwestern Sichuan,China,as the case study.This area,characterized by complex terrain,frequent tectonic activities,and steep slope erosion,experiences recurrent landslides,making it an ideal setting for validating our proposed method.We calculate the contribution values of environmental factors using the relief algorithm to construct the feature space,apply the Target Space Exteriorization Sampling(TSES)method to select negative samples,calculate landslide probability values by Random Forest(RF)modeling,and then create regional landslide susceptibility maps.We evaluate the performance of the RF model optimized by the Environmental Factor Selection-based TSES(EFSTSES)method using standard performance metrics.The results indicated that the model achieved an accuracy(ACC)of 0.962,precision(PRE)of 0.961,and an area under the curve(AUC)of 0.962.These findings demonstrate that the EFSTSES-based model effectively mitigates the negative sample imbalance issue,enhances the differentiation between landslide and non-landslide samples,and reduces misclassification,particularly in geologically complex areas.These improvements offer valuable insights for disaster prevention,land use planning,and risk mitigation strategies.
文摘为了改善传统Relief算法适应性和鲁棒性差的缺陷,融合间距最大化、信息熵和分类局部一致性,构造了新的间距最大化目标函数,并进一步对目标函数进行优化,得到一些新的理论结果。在此基础上提出了新的基于两类数据的Relief特征加权算法LIE-Relief-T(Local consistency information entropy Relief algorithm based twoclass data),并将其扩展到多类数据的特征加权算法LIE-Relief-MLocal consistency information entropy Relief algorithm based multi-class data)。利用UCI和基因表达数据集进行实验验证,结果表明该新的Relief特征加权算法分类错误率较低,对噪声和野点表现出了更好的适应性和鲁棒性。