Slope units are divided according to the real topography and have clear geological characteristics,making them ideal units for evaluating the susceptibility to geological disasters.Based on the results of automaticall...Slope units are divided according to the real topography and have clear geological characteristics,making them ideal units for evaluating the susceptibility to geological disasters.Based on the results of automatically and manually corrected hydrological slope unit division,the Longhua District,Shenzhen City,Guangdong Province,was selected as the study area.A total of 15 influencing factors,namely Fluctuation,slope,slope aspect,curvature,topographic witness index(TWI),stream power index(SPI),topographic roughness index(TRI),annual average rainfall,distance to water system,engineering rock group,distance to fault,land use,normalized difference vegetation index(NDVI),nighttime light,and distance to road,were selected as evaluation indicators.The information volume model(IV)and random points were used to select non-geological disaster units,and then the random forest model(RF)was used to evaluate the susceptibility to geological disasters.The automatic slope unit and the hydrological slope unit were compared and analyzed in the random forest and information volume random forest models.The results show that the area under the curve(AUC)values of the automatic slope unit evaluation results are 0.931 for the IV-RF model and 0.716 for the RF model,which are 0.6%(IV-RF model)and 1.9%(RF model)higher than those for the hydrological slope unit.Based on a comparison of the evaluation methods based on the two types of slope units,the hydrological slope unit evaluation method based on manual correction is highly subjective,is complicated to operate,and has a low evaluation accuracy,whereas the evaluation method based on automatic slope unit division is efficient and accurate,is suitable for large-scale efficient geological disaster evaluation,and can better deal with the problem of geological disaster susceptibility evaluation.展开更多
Accurate Electric Load Forecasting(ELF)is crucial for optimizing production capacity,improving operational efficiency,and managing energy resources effectively.Moreover,precise ELF contributes to a smaller environment...Accurate Electric Load Forecasting(ELF)is crucial for optimizing production capacity,improving operational efficiency,and managing energy resources effectively.Moreover,precise ELF contributes to a smaller environmental footprint by reducing the risks of disruption,downtime,and waste.However,with increasingly complex energy consumption patterns driven by renewable energy integration and changing consumer behaviors,no single approach has emerged as universally effective.In response,this research presents a hybrid modeling framework that combines the strengths of Random Forest(RF)and Autoregressive Integrated Moving Average(ARIMA)models,enhanced with advanced feature selection—Minimum Redundancy Maximum Relevancy and Maximum Synergy(MRMRMS)method—to produce a sparse model.Additionally,the residual patterns are analyzed to enhance forecast accuracy.High-resolution weather data from Weather Underground and historical energy consumption data from PJM for Duke Energy Ohio and Kentucky(DEO&K)are used in this application.This methodology,termed SP-RF-ARIMA,is evaluated against existing approaches;it demonstrates more than 40%reduction in mean absolute error and root mean square error compared to the second-best method.展开更多
Stand age plays a crucial role in forest biomass estimation and carbon cycle modeling.Assessing the uncertainty of stand age prediction models and identifying the key driving factors in the modeling process have becom...Stand age plays a crucial role in forest biomass estimation and carbon cycle modeling.Assessing the uncertainty of stand age prediction models and identifying the key driving factors in the modeling process have become major challenges in forestry research.In this study,we selected the Shaanxi-Gansu-Ningxia region of Northeast China as the research area and utilized multi-source datasets from the summer of 2019 to extract information on spectral,textural,climatic,water balance,and stand characteristics.By integrating the Random Forest(RF)model with Monte Carlo(MC)simulation,we constructed six regression models based on different combina-tions of features and evaluated the uncertainty of each model.Furthermore,we investigated the driving factors influencing stand age modeling by analyzing the effects of different types of features on age inversion.Model performance and accuracy were assessed using the root mean square error(RMSE),mean absolute error(MAE),and the coefficient of determination(R^(2)),while the relative root mean square error(rRMSE)was employed to quantify model uncertainty.The results indicate that the scenarios with more obvious improve-ment in accuracy and effective reduction in uncertainty were Scenario 3 with the inclusion of climate and water balance information(RMSE=25.54 yr,MAE=18.03 yr,R^(2)=0.51,rRMSE=19.17%)and Scenario 5 with the inclusion of stand characterization informa-tion(RMSE=18.47 yr,MAE=13.05 yr,R^(2)=0.74,rRMSE=16.99%).Scenario 6,incorporating all feature types,achieved the highest accuracy(RMSE=17.60 yr,MAE=12.06 yr,R^(2)=0.77,rRMSE=14.19%).In this study,elevation,minimum temperature,and diameter at breast height(DBH)emerged as the key drivers of stand-age modeling.The proposed method can be used to identify drivers and to quantify uncertainty in stand-age estimation,providing a useful reference for improving model accuracy and uncertainty assessment.展开更多
Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support v...Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support vector machine(SVM),as well as ensemble methods,such as Gradient Boosting and eXtreme gradient boosting(XGBoost),are often plagued by high computational costs,which makes it challenging for them to perform real-time detection.In this regard,we suggested an attack detection approach that integrates Visual Geometry Group 16(VGG16),Artificial Rabbits Optimizer(ARO),and Random Forest Model to increase detection accuracy and operational efficiency in Internet of Things(IoT)networks.In the suggested model,the extraction of features from malware pictures was accomplished with the help of VGG16.The prediction process is carried out by the random forest model using the extracted features from the VGG16.Additionally,ARO is used to improve the hyper-parameters of the random forest model of the random forest.With an accuracy of 96.36%,the suggested model outperforms the standard models in terms of accuracy,F1-score,precision,and recall.The comparative research highlights our strategy’s success,which improves performance while maintaining a lower computational cost.This method is ideal for real-time applications,but it is effective.展开更多
Zenith wet delay(ZWD)is a key parameter for the precise positioning of global navigation satellite systems(GNSS)and occupies a central role in meteorological research.Currently,most models only consider the periodic v...Zenith wet delay(ZWD)is a key parameter for the precise positioning of global navigation satellite systems(GNSS)and occupies a central role in meteorological research.Currently,most models only consider the periodic variability of the ZWD,neglecting the effect of nonlinear factors on the ZWD estimation.This oversight results in a limited capability to reflect the rapid fluctuations of the ZWD.To more accurately capture and predict complicated variations in ZWD,this paper developed the CRZWD model by a combination of the GPT3 model and random forests(RF)algorithm using 5-year atmospheric profiles from 70 radiosonde(RS)stations across China.Taking the external 25 test stations data as reference,the root mean square(RMS)of the CRZWD model is 29.95 mm.Compared with the GPT3 model and another model using backpropagation neural network(BPNN),the accuracy has improved by 24.7%and 15.9%,respectively.Notably,over 56%of the test stations exhibit an improvement of more than 20%in contrast to GPT3-ZWD.Further temporal and spatial characteristic analyses also demonstrate the significant accuracy and stability advantages of the CRZWD model,indicating the potential prospects for GNSS-based applications.展开更多
This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting de...This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.展开更多
Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat i...Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.展开更多
Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Senti...Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Sentinel-2), spectral indices, and ancillary data to model the spatial distribution of heavy metals in the soils along the Nairobi River. The model was generated using the Random Forest package in R. Using R2 to assess the prediction accuracy, the Random Forest model generated satisfactory results for all the elements. It also ranked the variables in order of their importance in the overall prediction. Spectral indices were the most important variables within the rankings. From the predicted topsoil maps, there were high concentrations of Cadmium on the easterly end of the river. Cadmium is an impurity in detergents, and this section is in close proximity to the Nairobi water sewerage plant, which could be a direct source of Cadmium. Some farms had Zinc levels which were above the World Health Organization recommended limit. The Random Forest model performed satisfactorily. However, the predictions can be improved further if the spatial resolutions of the various variables are increased and through the addition of more predictor variables.展开更多
Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index...Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.展开更多
【目的】耒阳市滑坡灾害频发,对人民生命财产和生态安全构成严重威胁。为提高滑坡易发性评价的精度,【方法】以湖南省耒阳市为研究区,构建信息量模型(information value model,IV)与随机森林模型(random forest,RF)耦合的IV-RF模型,引...【目的】耒阳市滑坡灾害频发,对人民生命财产和生态安全构成严重威胁。为提高滑坡易发性评价的精度,【方法】以湖南省耒阳市为研究区,构建信息量模型(information value model,IV)与随机森林模型(random forest,RF)耦合的IV-RF模型,引入空间约束采样策略优化负样本选取策略,开展滑坡易发性评价。通过ROC曲线和AUC值对3种模型进行对比分析,同时提出综合性能指数用于综合评价模型表现。【结果】1)IV-RF耦合模型表现优于单一模型,AUC=0.952,综合性能指数(Accuracy+F1+MCC)为2.593。极高-高易发区滑坡点分布密集,极低-低易发区滑坡点极少,验证模型具有较高的空间预测精度。2)工程地质岩组因子是影响研究区滑坡发育最重要的评价因子之一。【结论】IV-RF耦合模型结合IV的数据定量解译与RF的非线性识别能力,可有效提升模型识别精度,研究结果可为研究区滑坡灾害风险防控、水土保持和国土空间规划提供科学依据。展开更多
To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section,...To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.展开更多
One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this p...One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this process often relies on human experience and judgment,which will introduce subjectivity and error.In this paper,an intelligent approach is proposed for matching EIS data to their equivalent circuits based on the Random Forest algorithm.It can automatically select the most suitable equivalent circuit model based on the characteristics and patterns of EIS data.Addressing the typical scenario of metal corrosion,an atmospheric corrosion EIS dataset of low-carbon steel is constructed in this paper,which includes five different corrosion scenarios.This dataset was used to validate and evaluate the pro-posed method in this paper.The contributions of this paper can be summarized in three aspects:(1)This paper proposes a method for selecting equivalent circuit models for EIS data based on the Random Forest algorithm.(2)Using authentic EIS data collected from metal atmospheric corrosion,the paper es-tablishes a dataset encompassing five categories of metal corrosion scenarios.(3)The superiority of the proposed method is validated through the utilization of the established authentic EIS dataset.The ex-periment results demonstrate that,in terms of equivalent circuit matching,this method surpasses other machine learning algorithms in both precision and robustness.Furthermore,it shows strong applicability in the analysis of EIS data.展开更多
To enhance the prediction accuracy of landslides in in Longyan City,China,this study developed a methodology for geologic hazard susceptibility assessment based on a coupled model composed of a Geographic Information ...To enhance the prediction accuracy of landslides in in Longyan City,China,this study developed a methodology for geologic hazard susceptibility assessment based on a coupled model composed of a Geographic Information System(GIS)with integrated spatial data,a frequency ratio(FR)model,and a random forest(RF)model(also referred to as the coupled FR-RF model).The coupled FR-RF model was constructed based on the analysis of nine influential factors,including distance from roads,normalized difference vegetation index(NDVI),and slope.The performance of the coupled FR-RF model was assessed using metrics such as Receiver Operating Characteristic(ROC)and Precision-Recall(PR)curves,yielding Area Under the Curve(AUC)values of 0.93 and 0.95,which indicate high predictive accuracy and reliability for geological hazard forecasting.Based on the model predictions,five susceptibility levels were determined in the study area,providing crucial spatial information for geologic hazard prevention and control.The contributions of various influential factors to landslide susceptibility were determined using SHapley Additive exPlanations(SHAP)analysis and the Gini index,enhancing the model interpretability and transparency.Additionally,this study discussed the limitations of the coupled FR-RF model and the prospects for its improvement using new technologies.This study provides an innovative method and theoretical support for geologic hazard prediction and management,holding promising prospects for application.展开更多
Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning ba...Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning based on endowments as the modeling unit,conducted stratified sampling on a hectare grid cell,and systematically carried out incremental selection experiments of population density impact factors,optimizing the population density random forest model throughout the process(zonal modeling,stratified sampling,factor selection,weighted output).The results are as follows:(1)Zonal modeling addresses the issue of confusion in population distribution laws caused by a single model.Sampling on a grid cell not only ensures the quality of training data by avoiding the modifiable areal unit problem(MAUP)but also attempts to mitigate the adverse effects of the ecological fallacy.Stratified sampling ensures the stability of population density label values(target variable)in the training sample.(2)Zonal selection experiments on population density impact factors help identify suitable combinations of factors,leading to a significant improvement in the goodness of fit(R^(2))of the zonal models.(3)Weighted combination output of the population density prediction dataset substantially enhances the model's robustness.(4)The population density dataset exhibits multi-scale superposition characteristics.On a large scale,the population density in plains is higher than that in mountainous areas,while on a small scale,urban areas have higher density compared to rural areas.The optimization scheme for the population density random forest model that we propose offers a unified technical framework for uncovering local population distribution law and the impact mechanisms.展开更多
Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identificatio...Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.展开更多
文摘Slope units are divided according to the real topography and have clear geological characteristics,making them ideal units for evaluating the susceptibility to geological disasters.Based on the results of automatically and manually corrected hydrological slope unit division,the Longhua District,Shenzhen City,Guangdong Province,was selected as the study area.A total of 15 influencing factors,namely Fluctuation,slope,slope aspect,curvature,topographic witness index(TWI),stream power index(SPI),topographic roughness index(TRI),annual average rainfall,distance to water system,engineering rock group,distance to fault,land use,normalized difference vegetation index(NDVI),nighttime light,and distance to road,were selected as evaluation indicators.The information volume model(IV)and random points were used to select non-geological disaster units,and then the random forest model(RF)was used to evaluate the susceptibility to geological disasters.The automatic slope unit and the hydrological slope unit were compared and analyzed in the random forest and information volume random forest models.The results show that the area under the curve(AUC)values of the automatic slope unit evaluation results are 0.931 for the IV-RF model and 0.716 for the RF model,which are 0.6%(IV-RF model)and 1.9%(RF model)higher than those for the hydrological slope unit.Based on a comparison of the evaluation methods based on the two types of slope units,the hydrological slope unit evaluation method based on manual correction is highly subjective,is complicated to operate,and has a low evaluation accuracy,whereas the evaluation method based on automatic slope unit division is efficient and accurate,is suitable for large-scale efficient geological disaster evaluation,and can better deal with the problem of geological disaster susceptibility evaluation.
基金supported by the Startup Grant(PG18929)awarded to F.Shokoohi.
文摘Accurate Electric Load Forecasting(ELF)is crucial for optimizing production capacity,improving operational efficiency,and managing energy resources effectively.Moreover,precise ELF contributes to a smaller environmental footprint by reducing the risks of disruption,downtime,and waste.However,with increasingly complex energy consumption patterns driven by renewable energy integration and changing consumer behaviors,no single approach has emerged as universally effective.In response,this research presents a hybrid modeling framework that combines the strengths of Random Forest(RF)and Autoregressive Integrated Moving Average(ARIMA)models,enhanced with advanced feature selection—Minimum Redundancy Maximum Relevancy and Maximum Synergy(MRMRMS)method—to produce a sparse model.Additionally,the residual patterns are analyzed to enhance forecast accuracy.High-resolution weather data from Weather Underground and historical energy consumption data from PJM for Duke Energy Ohio and Kentucky(DEO&K)are used in this application.This methodology,termed SP-RF-ARIMA,is evaluated against existing approaches;it demonstrates more than 40%reduction in mean absolute error and root mean square error compared to the second-best method.
基金Under the auspices of the Natural Science Foundation of China(No.32371875,32001249)。
文摘Stand age plays a crucial role in forest biomass estimation and carbon cycle modeling.Assessing the uncertainty of stand age prediction models and identifying the key driving factors in the modeling process have become major challenges in forestry research.In this study,we selected the Shaanxi-Gansu-Ningxia region of Northeast China as the research area and utilized multi-source datasets from the summer of 2019 to extract information on spectral,textural,climatic,water balance,and stand characteristics.By integrating the Random Forest(RF)model with Monte Carlo(MC)simulation,we constructed six regression models based on different combina-tions of features and evaluated the uncertainty of each model.Furthermore,we investigated the driving factors influencing stand age modeling by analyzing the effects of different types of features on age inversion.Model performance and accuracy were assessed using the root mean square error(RMSE),mean absolute error(MAE),and the coefficient of determination(R^(2)),while the relative root mean square error(rRMSE)was employed to quantify model uncertainty.The results indicate that the scenarios with more obvious improve-ment in accuracy and effective reduction in uncertainty were Scenario 3 with the inclusion of climate and water balance information(RMSE=25.54 yr,MAE=18.03 yr,R^(2)=0.51,rRMSE=19.17%)and Scenario 5 with the inclusion of stand characterization informa-tion(RMSE=18.47 yr,MAE=13.05 yr,R^(2)=0.74,rRMSE=16.99%).Scenario 6,incorporating all feature types,achieved the highest accuracy(RMSE=17.60 yr,MAE=12.06 yr,R^(2)=0.77,rRMSE=14.19%).In this study,elevation,minimum temperature,and diameter at breast height(DBH)emerged as the key drivers of stand-age modeling.The proposed method can be used to identify drivers and to quantify uncertainty in stand-age estimation,providing a useful reference for improving model accuracy and uncertainty assessment.
基金funded by Institutional Fund Projects under grant no.(IFPDP-261-22)。
文摘Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support vector machine(SVM),as well as ensemble methods,such as Gradient Boosting and eXtreme gradient boosting(XGBoost),are often plagued by high computational costs,which makes it challenging for them to perform real-time detection.In this regard,we suggested an attack detection approach that integrates Visual Geometry Group 16(VGG16),Artificial Rabbits Optimizer(ARO),and Random Forest Model to increase detection accuracy and operational efficiency in Internet of Things(IoT)networks.In the suggested model,the extraction of features from malware pictures was accomplished with the help of VGG16.The prediction process is carried out by the random forest model using the extracted features from the VGG16.Additionally,ARO is used to improve the hyper-parameters of the random forest model of the random forest.With an accuracy of 96.36%,the suggested model outperforms the standard models in terms of accuracy,F1-score,precision,and recall.The comparative research highlights our strategy’s success,which improves performance while maintaining a lower computational cost.This method is ideal for real-time applications,but it is effective.
基金supported by the National Natural Science Foundation of China[42030109,42074012]the Scientific Study Project for institutes of Higher Learning,Ministry of Education,Liaoning Province[LJKMZ20220673]+2 种基金the Project supported by the State Key Laboratory of Geodesy and Earths'Dynamics,Innovation Academy for Precision Measurement Science and Technology[SKLGED2023-3-2]Liaoning Revitalization Talent Program[XLYC2203162]Natural Science Foundation of Hebei Province in China[D2023402024].
文摘Zenith wet delay(ZWD)is a key parameter for the precise positioning of global navigation satellite systems(GNSS)and occupies a central role in meteorological research.Currently,most models only consider the periodic variability of the ZWD,neglecting the effect of nonlinear factors on the ZWD estimation.This oversight results in a limited capability to reflect the rapid fluctuations of the ZWD.To more accurately capture and predict complicated variations in ZWD,this paper developed the CRZWD model by a combination of the GPT3 model and random forests(RF)algorithm using 5-year atmospheric profiles from 70 radiosonde(RS)stations across China.Taking the external 25 test stations data as reference,the root mean square(RMS)of the CRZWD model is 29.95 mm.Compared with the GPT3 model and another model using backpropagation neural network(BPNN),the accuracy has improved by 24.7%and 15.9%,respectively.Notably,over 56%of the test stations exhibit an improvement of more than 20%in contrast to GPT3-ZWD.Further temporal and spatial characteristic analyses also demonstrate the significant accuracy and stability advantages of the CRZWD model,indicating the potential prospects for GNSS-based applications.
基金This work was supported in part by the National Natural Science Foundation of China(61601418,41602362,61871259)in part by the Opening Foundation of Hunan Engineering and Research Center of Natural Resource Investigation and Monitoring(2020-5)+1 种基金in part by the Qilian Mountain National Park Research Center(Qinghai)(grant number:GKQ2019-01)in part by the Geomatics Technology and Application Key Laboratory of Qinghai Province,Grant No.QHDX-2019-01.
文摘This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.
基金Under the auspices of National Natural Science Foundation of China(No.41977411,41771383)Technology Research Project of the Education Department of Jilin Province(No.JJKH20210445KJ)。
文摘Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.
文摘Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Sentinel-2), spectral indices, and ancillary data to model the spatial distribution of heavy metals in the soils along the Nairobi River. The model was generated using the Random Forest package in R. Using R2 to assess the prediction accuracy, the Random Forest model generated satisfactory results for all the elements. It also ranked the variables in order of their importance in the overall prediction. Spectral indices were the most important variables within the rankings. From the predicted topsoil maps, there were high concentrations of Cadmium on the easterly end of the river. Cadmium is an impurity in detergents, and this section is in close proximity to the Nairobi water sewerage plant, which could be a direct source of Cadmium. Some farms had Zinc levels which were above the World Health Organization recommended limit. The Random Forest model performed satisfactorily. However, the predictions can be improved further if the spatial resolutions of the various variables are increased and through the addition of more predictor variables.
文摘Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.
文摘【目的】耒阳市滑坡灾害频发,对人民生命财产和生态安全构成严重威胁。为提高滑坡易发性评价的精度,【方法】以湖南省耒阳市为研究区,构建信息量模型(information value model,IV)与随机森林模型(random forest,RF)耦合的IV-RF模型,引入空间约束采样策略优化负样本选取策略,开展滑坡易发性评价。通过ROC曲线和AUC值对3种模型进行对比分析,同时提出综合性能指数用于综合评价模型表现。【结果】1)IV-RF耦合模型表现优于单一模型,AUC=0.952,综合性能指数(Accuracy+F1+MCC)为2.593。极高-高易发区滑坡点分布密集,极低-低易发区滑坡点极少,验证模型具有较高的空间预测精度。2)工程地质岩组因子是影响研究区滑坡发育最重要的评价因子之一。【结论】IV-RF耦合模型结合IV的数据定量解译与RF的非线性识别能力,可有效提升模型识别精度,研究结果可为研究区滑坡灾害风险防控、水土保持和国土空间规划提供科学依据。
文摘To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.
基金support of the project from the National Key R&D Program of China,Research and Application of Sensing System for Cross-regional Complex Oil&Gas Pipeline Network Safe and Efficiency Operational Status Monitoring(Grant No.2022YFB3207603).
文摘One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this process often relies on human experience and judgment,which will introduce subjectivity and error.In this paper,an intelligent approach is proposed for matching EIS data to their equivalent circuits based on the Random Forest algorithm.It can automatically select the most suitable equivalent circuit model based on the characteristics and patterns of EIS data.Addressing the typical scenario of metal corrosion,an atmospheric corrosion EIS dataset of low-carbon steel is constructed in this paper,which includes five different corrosion scenarios.This dataset was used to validate and evaluate the pro-posed method in this paper.The contributions of this paper can be summarized in three aspects:(1)This paper proposes a method for selecting equivalent circuit models for EIS data based on the Random Forest algorithm.(2)Using authentic EIS data collected from metal atmospheric corrosion,the paper es-tablishes a dataset encompassing five categories of metal corrosion scenarios.(3)The superiority of the proposed method is validated through the utilization of the established authentic EIS dataset.The ex-periment results demonstrate that,in terms of equivalent circuit matching,this method surpasses other machine learning algorithms in both precision and robustness.Furthermore,it shows strong applicability in the analysis of EIS data.
基金supported by the project of the China Geological Survey(DD20230591).
文摘To enhance the prediction accuracy of landslides in in Longyan City,China,this study developed a methodology for geologic hazard susceptibility assessment based on a coupled model composed of a Geographic Information System(GIS)with integrated spatial data,a frequency ratio(FR)model,and a random forest(RF)model(also referred to as the coupled FR-RF model).The coupled FR-RF model was constructed based on the analysis of nine influential factors,including distance from roads,normalized difference vegetation index(NDVI),and slope.The performance of the coupled FR-RF model was assessed using metrics such as Receiver Operating Characteristic(ROC)and Precision-Recall(PR)curves,yielding Area Under the Curve(AUC)values of 0.93 and 0.95,which indicate high predictive accuracy and reliability for geological hazard forecasting.Based on the model predictions,five susceptibility levels were determined in the study area,providing crucial spatial information for geologic hazard prevention and control.The contributions of various influential factors to landslide susceptibility were determined using SHapley Additive exPlanations(SHAP)analysis and the Gini index,enhancing the model interpretability and transparency.Additionally,this study discussed the limitations of the coupled FR-RF model and the prospects for its improvement using new technologies.This study provides an innovative method and theoretical support for geologic hazard prediction and management,holding promising prospects for application.
基金National Natural Science Foundation of China,No.42071167,No.42201197,No.40871073The Second Tibetan Plateau Scientific Expedition and Research Program,No.2019QZKK0406Natural Science Foundation of Hebei Province,No.D2007000272。
文摘Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning based on endowments as the modeling unit,conducted stratified sampling on a hectare grid cell,and systematically carried out incremental selection experiments of population density impact factors,optimizing the population density random forest model throughout the process(zonal modeling,stratified sampling,factor selection,weighted output).The results are as follows:(1)Zonal modeling addresses the issue of confusion in population distribution laws caused by a single model.Sampling on a grid cell not only ensures the quality of training data by avoiding the modifiable areal unit problem(MAUP)but also attempts to mitigate the adverse effects of the ecological fallacy.Stratified sampling ensures the stability of population density label values(target variable)in the training sample.(2)Zonal selection experiments on population density impact factors help identify suitable combinations of factors,leading to a significant improvement in the goodness of fit(R^(2))of the zonal models.(3)Weighted combination output of the population density prediction dataset substantially enhances the model's robustness.(4)The population density dataset exhibits multi-scale superposition characteristics.On a large scale,the population density in plains is higher than that in mountainous areas,while on a small scale,urban areas have higher density compared to rural areas.The optimization scheme for the population density random forest model that we propose offers a unified technical framework for uncovering local population distribution law and the impact mechanisms.
基金supported by the grants from the Natural Science Foundation of Hubei Province(No.2020CFB780)the Fundamental Research Funds for the Central Universities(No.2017KFYXJJ020).
文摘Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.