This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting de...This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.展开更多
This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock pric...This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.展开更多
BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms ...BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.展开更多
The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timel...The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.展开更多
Multi-level multi-scale resource selection models using machine learning were compared and contrasted for generating predictive maps of jaguar habitat (Panthera onca) in the Brazilian Pantanal. Multiple spatial scales...Multi-level multi-scale resource selection models using machine learning were compared and contrasted for generating predictive maps of jaguar habitat (Panthera onca) in the Brazilian Pantanal. Multiple spatial scales and temporal movement levels were run within several analytical modeling frameworks for comparison. Included in the analysis were multi-scale raster grains (30 m, 90 m, 180 m, 360 m, 720 m, 1440 m) and GPS collaring temporal movement levels (point, path, and step). Various analytical methods were used for comparison of models that could accommodate data structural levels (group, individual, case-control). Models compared included conditional logistic regression, generalized additive modeling (GAM), and classification regression trees, such as random forests (RF) and gradient boosted regression tree (GBM). The goals of the study were to discuss the potential and limitations for machine learning methods using GPS collaring data to produce predictive habitat suitability mapping using the various scales and levels available. Results indicated that choosing the appropriate temporal level and raster scale improved model outputs. Overall, larger level analytical modeling frameworks and those that used multi-scale raster grains showed the best model evaluation with the inherent condition that they predict a broader scale and subset of data. The identification of the appropriate spatial scale, temporal scale and statistical model need careful consideration in predictive mapping efforts.展开更多
基金This work was supported in part by the National Natural Science Foundation of China(61601418,41602362,61871259)in part by the Opening Foundation of Hunan Engineering and Research Center of Natural Resource Investigation and Monitoring(2020-5)+1 种基金in part by the Qilian Mountain National Park Research Center(Qinghai)(grant number:GKQ2019-01)in part by the Geomatics Technology and Application Key Laboratory of Qinghai Province,Grant No.QHDX-2019-01.
文摘This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.
文摘This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.
文摘BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.
基金funded by the National Key Research and Development Program of China Strategic International Cooperation in Science and Technology Innovation Program (2018YFE0207800)the National Natural Science Foundation of China (31971483)。
文摘The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.
文摘Multi-level multi-scale resource selection models using machine learning were compared and contrasted for generating predictive maps of jaguar habitat (Panthera onca) in the Brazilian Pantanal. Multiple spatial scales and temporal movement levels were run within several analytical modeling frameworks for comparison. Included in the analysis were multi-scale raster grains (30 m, 90 m, 180 m, 360 m, 720 m, 1440 m) and GPS collaring temporal movement levels (point, path, and step). Various analytical methods were used for comparison of models that could accommodate data structural levels (group, individual, case-control). Models compared included conditional logistic regression, generalized additive modeling (GAM), and classification regression trees, such as random forests (RF) and gradient boosted regression tree (GBM). The goals of the study were to discuss the potential and limitations for machine learning methods using GPS collaring data to produce predictive habitat suitability mapping using the various scales and levels available. Results indicated that choosing the appropriate temporal level and raster scale improved model outputs. Overall, larger level analytical modeling frameworks and those that used multi-scale raster grains showed the best model evaluation with the inherent condition that they predict a broader scale and subset of data. The identification of the appropriate spatial scale, temporal scale and statistical model need careful consideration in predictive mapping efforts.