Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support v...Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support vector machine(SVM),as well as ensemble methods,such as Gradient Boosting and eXtreme gradient boosting(XGBoost),are often plagued by high computational costs,which makes it challenging for them to perform real-time detection.In this regard,we suggested an attack detection approach that integrates Visual Geometry Group 16(VGG16),Artificial Rabbits Optimizer(ARO),and Random Forest Model to increase detection accuracy and operational efficiency in Internet of Things(IoT)networks.In the suggested model,the extraction of features from malware pictures was accomplished with the help of VGG16.The prediction process is carried out by the random forest model using the extracted features from the VGG16.Additionally,ARO is used to improve the hyper-parameters of the random forest model of the random forest.With an accuracy of 96.36%,the suggested model outperforms the standard models in terms of accuracy,F1-score,precision,and recall.The comparative research highlights our strategy’s success,which improves performance while maintaining a lower computational cost.This method is ideal for real-time applications,but it is effective.展开更多
Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning ba...Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning based on endowments as the modeling unit,conducted stratified sampling on a hectare grid cell,and systematically carried out incremental selection experiments of population density impact factors,optimizing the population density random forest model throughout the process(zonal modeling,stratified sampling,factor selection,weighted output).The results are as follows:(1)Zonal modeling addresses the issue of confusion in population distribution laws caused by a single model.Sampling on a grid cell not only ensures the quality of training data by avoiding the modifiable areal unit problem(MAUP)but also attempts to mitigate the adverse effects of the ecological fallacy.Stratified sampling ensures the stability of population density label values(target variable)in the training sample.(2)Zonal selection experiments on population density impact factors help identify suitable combinations of factors,leading to a significant improvement in the goodness of fit(R^(2))of the zonal models.(3)Weighted combination output of the population density prediction dataset substantially enhances the model's robustness.(4)The population density dataset exhibits multi-scale superposition characteristics.On a large scale,the population density in plains is higher than that in mountainous areas,while on a small scale,urban areas have higher density compared to rural areas.The optimization scheme for the population density random forest model that we propose offers a unified technical framework for uncovering local population distribution law and the impact mechanisms.展开更多
Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index...Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.展开更多
Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identificatio...Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.展开更多
Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Senti...Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Sentinel-2), spectral indices, and ancillary data to model the spatial distribution of heavy metals in the soils along the Nairobi River. The model was generated using the Random Forest package in R. Using R2 to assess the prediction accuracy, the Random Forest model generated satisfactory results for all the elements. It also ranked the variables in order of their importance in the overall prediction. Spectral indices were the most important variables within the rankings. From the predicted topsoil maps, there were high concentrations of Cadmium on the easterly end of the river. Cadmium is an impurity in detergents, and this section is in close proximity to the Nairobi water sewerage plant, which could be a direct source of Cadmium. Some farms had Zinc levels which were above the World Health Organization recommended limit. The Random Forest model performed satisfactorily. However, the predictions can be improved further if the spatial resolutions of the various variables are increased and through the addition of more predictor variables.展开更多
To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section,...To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.展开更多
This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting de...This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.展开更多
The car-following models are the research basis of traffic flow theory and microscopic traffic simulation. Among the previous work, the theory-driven models are dominant, while the data-driven ones are relatively rare...The car-following models are the research basis of traffic flow theory and microscopic traffic simulation. Among the previous work, the theory-driven models are dominant, while the data-driven ones are relatively rare. In recent years, the related technologies of Intelligent Transportation System (ITS) re</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">presented by the Vehicles to Everything (V2X) technology have been developing rapidly. Utilizing the related technologies of ITS, the large-scale vehicle microscopic trajectory data with high quality can be acquired, which provides the research foundation for modeling the car-following behavior based on the data-driven methods. According to this point, a data-driven car-following model based on the Random Forest (RF) method was constructed in this work, and the Next Generation Simulation (NGSIM) dataset was used to calibrate and train the constructed model. The Artificial Neural Network (ANN) model, GM model, and Full Velocity Difference (FVD) model are em</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">ployed to comparatively verify the proposed model. The research results suggest that the model proposed in this work can accurately describe the car-</span><span style="font-family:Verdana;"> </span><span style="font-family:Verdana;">following behavior with better performance under multiple performance indicators.展开更多
This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock pric...This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.展开更多
This study investigates the application of machine learning models to address after-sales service issues in cross-border e-commerce,focusing on predicting order returns to reduce return costs and optimize customer exp...This study investigates the application of machine learning models to address after-sales service issues in cross-border e-commerce,focusing on predicting order returns to reduce return costs and optimize customer experience.Using H cross-border e-commerce company as a case study,the research employs Random Forest and XGBoost models to identify high-risk return orders.By comparing the performance of these two models,the study highlights their respective strengths and weaknesses and proposes optimization strategies.The findings provide a valuable reference for e-commerce companies to refine their business models,reduce return rates,improve operational efficiency,and enhance customer satisfaction.展开更多
The“Yarlung Zangbo River,Lhasa River and Nyangqu River”(YLN)region is the main grain producing area on which the Tibetan people depend for survival.The densities of soil organic carbon(SOC),total nitrogen(TN)and tot...The“Yarlung Zangbo River,Lhasa River and Nyangqu River”(YLN)region is the main grain producing area on which the Tibetan people depend for survival.The densities of soil organic carbon(SOC),total nitrogen(TN)and total phosphorus(TP)in farmlands are closely related to grain production.Scientific management and regulation of these nutrient densities are of great significance for ensuring food security.However,accurate simulations of spatial variations in the densities of SOC(SOCD),TN(TND)and TP(TPD)and the spatial distributions of SOCD,TND and TPD are still unclear.In this study,388 samples of cultivated soils at 0–10 and 10–20 cm in the YLN region were collected to determine the SOC,TN,and TP contents,as well as pH and bulk density(BD).Random forest models of SOCD,TND and TPD were constructed using longitude,latitude,elevation,mean annual temperature,mean annual precipitation,mean annual radiation and vegetation index,which were then used to obtain the spatial distribution maps of SOCD,TND and TPD,and the storages of SOC(SOCS),TN(TNS)and TP(TPS).Mean annual radiation can partially explain the spatial variations of SOCD and TND,in addition to temperature and precipitation.The relative biases between modelled and observed SOCD,TND,TPD,SOCS,TNS and TPS ranged from–9.43%to 7.57%.The SOCD and TND increased from west to east,but they were both low in the middle and high in the north and south.The SOCD and TND decreased with increasing pH and BD.SOCD,TND and TPD were low at mid-elevations but high at low and high elevations.The SOCD,TND,TPD,SOCS,TNS and TPS were 2.72 kg m^(-2),0.30 kg m^(-2),0.18 kg m^(-2),4.88 Tg,0.54 Tg and 0.32 Tg,respectively,at 0–20 cm over the cultivated lands of the YLN region.Based on these results,the random forest models constructed in this study can be used for subsequent related studies.Besides warming and precipitation changes,radiation changes can also affect SOCD and TND.In terms of the production of food crops such as highland barley,the farmland soils in the YLN region currently can have relative deficiencies of nitrogen and phosphorus nutrients.In the future,measures such as increasing the application of organic fertilizers should be taken to improve the carbon sequestration capacity and nitrogen and phosphorus nutrition of the soil.These findings have important guiding significance for the fertilization management of cultivated lands in the YLN region and other alpine regions similar to the YLN region.展开更多
Height–diameter relationships are essential elements of forest assessment and modeling efforts.In this work,two linear and eighteen nonlinear height–diameter equations were evaluated to find a local model for Orient...Height–diameter relationships are essential elements of forest assessment and modeling efforts.In this work,two linear and eighteen nonlinear height–diameter equations were evaluated to find a local model for Oriental beech(Fagus orientalis Lipsky) in the Hyrcanian Forest in Iran.The predictive performance of these models was first assessed by different evaluation criteria: adjusted R^2(R^2_(adj)),root mean square error(RMSE),relative RMSE(%RMSE),bias,and relative bias(%bias) criteria.The best model was selected for use as the base mixed-effects model.Random parameters for test plots were estimated with different tree selection options.Results show that the Chapman–Richards model had better predictive ability in terms of adj R^2(0.81),RMSE(3.7 m),%RMSE(12.9),bias(0.8),%Bias(2.79) than the other models.Furthermore,the calibration response,based on a selection of four trees from the sample plots,resulted in a reduction percentage for bias and RMSE of about 1.6–2.7%.Our results indicate that the calibrated model produced the most accurate results.展开更多
对湖北田歌的分布与田歌孕育的地理环境之间的关系进行了探究,以期为区域音乐的实证研究提供新的思路和方法.以湖北田歌为研究对象,选取1 248个田歌样本数据集,运用地理信息系统(geographic information system,GIS)对初步选定的田歌分...对湖北田歌的分布与田歌孕育的地理环境之间的关系进行了探究,以期为区域音乐的实证研究提供新的思路和方法.以湖北田歌为研究对象,选取1 248个田歌样本数据集,运用地理信息系统(geographic information system,GIS)对初步选定的田歌分布及音乐要素影响因子进行建库,基于随机森林及可解释性算法(shapley additive explanations,SHAP)构建田歌影响因子体系分析模型,通过受试者工作特性曲线(receiver operating characteristic curve,ROC)对模型的有效性进行评价,分析田歌的分布、音乐要素与地理环境之间的关系.研究结果表明:1)基于随机森林构建的田歌影响因子体系模型预测效果较好,其曲线下面积(area under the curve,AUC)的值为0.82;2)对田歌产生及音乐要素影响因子重要性排序得出,多年平均降雨量和多年平均气温是孕育湖北田歌的主要因子.其随机森林及SHAP算法,能在一定程度上预测湖北田歌分布格局,对区域音乐文化与地理关联性研究具有重要意义.展开更多
基金funded by Institutional Fund Projects under grant no.(IFPDP-261-22)。
文摘Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support vector machine(SVM),as well as ensemble methods,such as Gradient Boosting and eXtreme gradient boosting(XGBoost),are often plagued by high computational costs,which makes it challenging for them to perform real-time detection.In this regard,we suggested an attack detection approach that integrates Visual Geometry Group 16(VGG16),Artificial Rabbits Optimizer(ARO),and Random Forest Model to increase detection accuracy and operational efficiency in Internet of Things(IoT)networks.In the suggested model,the extraction of features from malware pictures was accomplished with the help of VGG16.The prediction process is carried out by the random forest model using the extracted features from the VGG16.Additionally,ARO is used to improve the hyper-parameters of the random forest model of the random forest.With an accuracy of 96.36%,the suggested model outperforms the standard models in terms of accuracy,F1-score,precision,and recall.The comparative research highlights our strategy’s success,which improves performance while maintaining a lower computational cost.This method is ideal for real-time applications,but it is effective.
基金National Natural Science Foundation of China,No.42071167,No.42201197,No.40871073The Second Tibetan Plateau Scientific Expedition and Research Program,No.2019QZKK0406Natural Science Foundation of Hebei Province,No.D2007000272。
文摘Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning based on endowments as the modeling unit,conducted stratified sampling on a hectare grid cell,and systematically carried out incremental selection experiments of population density impact factors,optimizing the population density random forest model throughout the process(zonal modeling,stratified sampling,factor selection,weighted output).The results are as follows:(1)Zonal modeling addresses the issue of confusion in population distribution laws caused by a single model.Sampling on a grid cell not only ensures the quality of training data by avoiding the modifiable areal unit problem(MAUP)but also attempts to mitigate the adverse effects of the ecological fallacy.Stratified sampling ensures the stability of population density label values(target variable)in the training sample.(2)Zonal selection experiments on population density impact factors help identify suitable combinations of factors,leading to a significant improvement in the goodness of fit(R^(2))of the zonal models.(3)Weighted combination output of the population density prediction dataset substantially enhances the model's robustness.(4)The population density dataset exhibits multi-scale superposition characteristics.On a large scale,the population density in plains is higher than that in mountainous areas,while on a small scale,urban areas have higher density compared to rural areas.The optimization scheme for the population density random forest model that we propose offers a unified technical framework for uncovering local population distribution law and the impact mechanisms.
文摘Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.
基金supported by the grants from the Natural Science Foundation of Hubei Province(No.2020CFB780)the Fundamental Research Funds for the Central Universities(No.2017KFYXJJ020).
文摘Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.
文摘Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Sentinel-2), spectral indices, and ancillary data to model the spatial distribution of heavy metals in the soils along the Nairobi River. The model was generated using the Random Forest package in R. Using R2 to assess the prediction accuracy, the Random Forest model generated satisfactory results for all the elements. It also ranked the variables in order of their importance in the overall prediction. Spectral indices were the most important variables within the rankings. From the predicted topsoil maps, there were high concentrations of Cadmium on the easterly end of the river. Cadmium is an impurity in detergents, and this section is in close proximity to the Nairobi water sewerage plant, which could be a direct source of Cadmium. Some farms had Zinc levels which were above the World Health Organization recommended limit. The Random Forest model performed satisfactorily. However, the predictions can be improved further if the spatial resolutions of the various variables are increased and through the addition of more predictor variables.
文摘To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.
基金This work was supported in part by the National Natural Science Foundation of China(61601418,41602362,61871259)in part by the Opening Foundation of Hunan Engineering and Research Center of Natural Resource Investigation and Monitoring(2020-5)+1 种基金in part by the Qilian Mountain National Park Research Center(Qinghai)(grant number:GKQ2019-01)in part by the Geomatics Technology and Application Key Laboratory of Qinghai Province,Grant No.QHDX-2019-01.
文摘This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.
文摘The car-following models are the research basis of traffic flow theory and microscopic traffic simulation. Among the previous work, the theory-driven models are dominant, while the data-driven ones are relatively rare. In recent years, the related technologies of Intelligent Transportation System (ITS) re</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">presented by the Vehicles to Everything (V2X) technology have been developing rapidly. Utilizing the related technologies of ITS, the large-scale vehicle microscopic trajectory data with high quality can be acquired, which provides the research foundation for modeling the car-following behavior based on the data-driven methods. According to this point, a data-driven car-following model based on the Random Forest (RF) method was constructed in this work, and the Next Generation Simulation (NGSIM) dataset was used to calibrate and train the constructed model. The Artificial Neural Network (ANN) model, GM model, and Full Velocity Difference (FVD) model are em</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">ployed to comparatively verify the proposed model. The research results suggest that the model proposed in this work can accurately describe the car-</span><span style="font-family:Verdana;"> </span><span style="font-family:Verdana;">following behavior with better performance under multiple performance indicators.
文摘This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.
文摘This study investigates the application of machine learning models to address after-sales service issues in cross-border e-commerce,focusing on predicting order returns to reduce return costs and optimize customer experience.Using H cross-border e-commerce company as a case study,the research employs Random Forest and XGBoost models to identify high-risk return orders.By comparing the performance of these two models,the study highlights their respective strengths and weaknesses and proposes optimization strategies.The findings provide a valuable reference for e-commerce companies to refine their business models,reduce return rates,improve operational efficiency,and enhance customer satisfaction.
基金The Lhasa Science and Technology Plan Project(LSKJ202422)The Xizang Autonomous Region Science and Technology Project(XZ202501ZY0056)。
文摘The“Yarlung Zangbo River,Lhasa River and Nyangqu River”(YLN)region is the main grain producing area on which the Tibetan people depend for survival.The densities of soil organic carbon(SOC),total nitrogen(TN)and total phosphorus(TP)in farmlands are closely related to grain production.Scientific management and regulation of these nutrient densities are of great significance for ensuring food security.However,accurate simulations of spatial variations in the densities of SOC(SOCD),TN(TND)and TP(TPD)and the spatial distributions of SOCD,TND and TPD are still unclear.In this study,388 samples of cultivated soils at 0–10 and 10–20 cm in the YLN region were collected to determine the SOC,TN,and TP contents,as well as pH and bulk density(BD).Random forest models of SOCD,TND and TPD were constructed using longitude,latitude,elevation,mean annual temperature,mean annual precipitation,mean annual radiation and vegetation index,which were then used to obtain the spatial distribution maps of SOCD,TND and TPD,and the storages of SOC(SOCS),TN(TNS)and TP(TPS).Mean annual radiation can partially explain the spatial variations of SOCD and TND,in addition to temperature and precipitation.The relative biases between modelled and observed SOCD,TND,TPD,SOCS,TNS and TPS ranged from–9.43%to 7.57%.The SOCD and TND increased from west to east,but they were both low in the middle and high in the north and south.The SOCD and TND decreased with increasing pH and BD.SOCD,TND and TPD were low at mid-elevations but high at low and high elevations.The SOCD,TND,TPD,SOCS,TNS and TPS were 2.72 kg m^(-2),0.30 kg m^(-2),0.18 kg m^(-2),4.88 Tg,0.54 Tg and 0.32 Tg,respectively,at 0–20 cm over the cultivated lands of the YLN region.Based on these results,the random forest models constructed in this study can be used for subsequent related studies.Besides warming and precipitation changes,radiation changes can also affect SOCD and TND.In terms of the production of food crops such as highland barley,the farmland soils in the YLN region currently can have relative deficiencies of nitrogen and phosphorus nutrients.In the future,measures such as increasing the application of organic fertilizers should be taken to improve the carbon sequestration capacity and nitrogen and phosphorus nutrition of the soil.These findings have important guiding significance for the fertilization management of cultivated lands in the YLN region and other alpine regions similar to the YLN region.
基金This research received no specific grant from any funding agency in the public,commercial,or not-for-profit sectors
文摘Height–diameter relationships are essential elements of forest assessment and modeling efforts.In this work,two linear and eighteen nonlinear height–diameter equations were evaluated to find a local model for Oriental beech(Fagus orientalis Lipsky) in the Hyrcanian Forest in Iran.The predictive performance of these models was first assessed by different evaluation criteria: adjusted R^2(R^2_(adj)),root mean square error(RMSE),relative RMSE(%RMSE),bias,and relative bias(%bias) criteria.The best model was selected for use as the base mixed-effects model.Random parameters for test plots were estimated with different tree selection options.Results show that the Chapman–Richards model had better predictive ability in terms of adj R^2(0.81),RMSE(3.7 m),%RMSE(12.9),bias(0.8),%Bias(2.79) than the other models.Furthermore,the calibration response,based on a selection of four trees from the sample plots,resulted in a reduction percentage for bias and RMSE of about 1.6–2.7%.Our results indicate that the calibrated model produced the most accurate results.
文摘对湖北田歌的分布与田歌孕育的地理环境之间的关系进行了探究,以期为区域音乐的实证研究提供新的思路和方法.以湖北田歌为研究对象,选取1 248个田歌样本数据集,运用地理信息系统(geographic information system,GIS)对初步选定的田歌分布及音乐要素影响因子进行建库,基于随机森林及可解释性算法(shapley additive explanations,SHAP)构建田歌影响因子体系分析模型,通过受试者工作特性曲线(receiver operating characteristic curve,ROC)对模型的有效性进行评价,分析田歌的分布、音乐要素与地理环境之间的关系.研究结果表明:1)基于随机森林构建的田歌影响因子体系模型预测效果较好,其曲线下面积(area under the curve,AUC)的值为0.82;2)对田歌产生及音乐要素影响因子重要性排序得出,多年平均降雨量和多年平均气温是孕育湖北田歌的主要因子.其随机森林及SHAP算法,能在一定程度上预测湖北田歌分布格局,对区域音乐文化与地理关联性研究具有重要意义.