According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model u...In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.展开更多
The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more ...The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.展开更多
Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species divers...Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.展开更多
Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-...Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-tree method and propose the segmented linear regression trees(SLRT) model that replaces the traditional constant leaf model with linear ones. From the parametric view, SLRT can be employed as a recursive change point detect procedure for segmented linear regression(SLR) models,which is much more efficient and flexible than the traditional grid search method. Along this way,we propose to use the conditional Kendall's τ correlation coefficient to select the underlying change points. From the non-parametric view, we propose an efficient greedy splitting method that selects the splits by analyzing the association between residuals and each candidate split variable. Further, with the SLRT as a single-tree predictor, we propose a linear random forest approach that aggregates the SLRTs by a weighted average. Both simulation and empirical studies showed significant improvements than the CART trees and even the random forest.展开更多
The shear strength parameters of soil (cohesion and angle of internal friction) are quite essential in solving many civil engineering problems. In order to determine these parameters, laboratory tests are used. The ...The shear strength parameters of soil (cohesion and angle of internal friction) are quite essential in solving many civil engineering problems. In order to determine these parameters, laboratory tests are used. The main objective of this work is to evaluate the potential of Artificial Neural Network (ANN) and Regression Tree (CART) techniques for the indirect estimation of these parameters. Four different models, considering different combinations of 6 inputs, such as gravel %, sand %, silt %, clay %, dry density, and plasticity index, were investigated to evaluate the degree of their effects on the prediction of shear parameters. A performance evaluation was carried out using Correlation Coefficient and Root Mean Squared Error measures. It was observed that for the prediction of friction angle, the performance of both the techniques is about the same. However, for the prediction of cohesion, the ANN technique performs better than the CART technique. It was further observed that the model considering all of the 6 input soil parameters is the most appropriate model for the prediction of shear parameters. Also, connection weight and bias analyses of the best neural network (i.e., 6/2/2) were attempted using Connec- tion Weight, Garson, and proposed Weight-bias approaches to characterize the influence of input variables on shear strength parameters. It was observed that the Connection Weight Approach provides the best overall methodology for accurately quantifying variable importance, and should be favored over the other approaches examined in this study.展开更多
Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head b...Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.展开更多
Water stored in reservoirs has a lot of crucial function,including generating hydropower,supporting water supply,and relieving lasting droughts.During floods,water deliveries from reservoirs must be acceptable,so as t...Water stored in reservoirs has a lot of crucial function,including generating hydropower,supporting water supply,and relieving lasting droughts.During floods,water deliveries from reservoirs must be acceptable,so as to guarantee that the gross volume of water is at a safe level and any release from reservoirs will not trigger flooding downstream.This study aims to develop a well-versed assessment method for managing reservoirs and pre-releasing water outflows by using the machine learning technology.As a new and exciting AI area,this technology is regarded as the most valuable,time-saving,supervised and cost-effective approach.In this study,two data-driven forecasting models,i.e.,Regression Tree(RT)and Support Vector Machine(SVM),were employed for approximately 30 years’hydrological records,so as to simulate reservoir outflows.The SVM and RT models were applied to the data,accurately predicting the fluctuations in the water outflows of a Bhakra reservoir.Different input combinations were used to determine the most effective release.For cross-validation,the number of folds varied.It is found that quadratic SVM for 10 folds with seven different parameters would give the minimum RMSE,maximum R2,and minimum MAE;therefore,it can be considered as the best model for the dataset used in this study.展开更多
Bayesian Additive Regression Trees(BART)is a widely popular nonparametric regression model known for its accurate prediction capabilities.In certain situations,there is knowledge suggesting the existence of certain do...Bayesian Additive Regression Trees(BART)is a widely popular nonparametric regression model known for its accurate prediction capabilities.In certain situations,there is knowledge suggesting the existence of certain dominant variables.However,the BART model fails to fully utilize the knowledge.To tackle this problem,the paper introduces a modification to BART known as the Partially Fixed BART model.By fixing a portion of the trees’structure,this model enables more efficient utilization of prior knowledge,resulting in enhanced estimation accuracy.Moreover,the Partially Fixed BART model can offer more precise estimates and valuable insights for future analysis even when such prior knowledge is absent.Empirical results substantiate the enhancement of the proposed model in comparison to the original BART.展开更多
The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations...The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations traversing shallow waters,where ice-berg keels may reach the seabed,potentially damaging subsea structures.Consequently,costly and time-intensive iceberg manage-ment operations,such as towing and rerouting,are undertaken to safeguard subsea and offshore infrastructure.This study,therefore,explores the application of extra tree regression(ETR)as a robust solution for estimating iceberg draft,particularly in the preliminary phases of decision-making for iceberg management projects.Nine ETR models were developed using parameters influencing iceberg draft.Subsequent analyses identified the most effective models and significant input variables.Uncertainty analysis revealed that the superior ETR model tended to overestimate iceberg drafts;however,it achieved the highest precision,correlation,and simplicity in estimation.Comparison with decision tree regression,random forest regression,and empirical methods confirmed the superior perfor-mance of ETR in predicting iceberg drafts.展开更多
This paper presents a supervised learning algorithm for retinal vascular segmentation based on classification and regression tree (CART) algorithm and improved adptive bosting (AdaBoost). Local binary patterns (LBP) t...This paper presents a supervised learning algorithm for retinal vascular segmentation based on classification and regression tree (CART) algorithm and improved adptive bosting (AdaBoost). Local binary patterns (LBP) texture features and local features are extracted by extracting,reversing,dilating and enhancing the green components of retinal images to construct a 17-dimensional feature vector. A dataset is constructed by using the feature vector and the data manually marked by the experts. The feature is used to generate CART binary tree for nodes,where CART binary tree is as the AdaBoost weak classifier,and AdaBoost is improved by adding some re-judgment functions to form a strong classifier. The proposed algorithm is simulated on the digital retinal images for vessel extraction (DRIVE). The experimental results show that the proposed algorithm has higher segmentation accuracy for blood vessels,and the result basically contains complete blood vessel details. Moreover,the segmented blood vessel tree has good connectivity,which basically reflects the distribution trend of blood vessels. Compared with the traditional AdaBoost classification algorithm and the support vector machine (SVM) based classification algorithm,the proposed algorithm has higher average accuracy and reliability index,which is similar to the segmentation results of the state-of-the-art segmentation algorithm.展开更多
Determining the causal effect of special education is a critical topic when mak-ing educational policy that focuses on student achievement.However,current special education research is facing challenges from persisten...Determining the causal effect of special education is a critical topic when mak-ing educational policy that focuses on student achievement.However,current special education research is facing challenges from persistent selection bias and complex confounding.Bayesian Additive Regression Trees(BART)is em-ployed in this study to provide a flexible estimation of the academic perfor-mance.Targeted Maximum Likelihood Estimation(TMLE)is also integrated into the BART model,supporting doubly robust estimation of the special ed-ucation effect.This study extracted survey data from the Early Childhood Lon-gitudinal Study,Kindergarten Class(ECLS-K),to estimate the causal impact of special education status on students’combined mathematics and reading achievement scores.The analysis results of the BART-TMLE model show that children receiving special education services demonstrated approximately 9 points lower scores on average for combined math and reading scores,even adjusting for a considerable number of covariates,compared to their peers who did not receive these services.The estimated negative treatment effect persists after controlling for observed covariates that are closely correlated to the combined test score.The negative effect likely reflects unobserved factors,such as the underlying severity of learning disabilities,parent involvement and other potential traits,which are actual factors that determine the placement of special education status,rather than indicating the ineffectiveness of special education service.The achievement gap in academic performance reflects the current observable status of special education.The estimated effect could be improved by future research incorporating educational domain knowledge,allowing the model to be constructed more accurately.展开更多
Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification....Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification. In this paper, a new method of using a tree regression to improve logistic classification model is introduced in biomarker data analysis. The numerical results show that the linear logistic model can be significantly improved by a tree regression on the residuals. Although the classification problem of binary responses is discussed in this research, the idea is easy to extend to the classification of multinomial responses.展开更多
Understanding the impact of meteorological and topographical factors on snow cover fraction(SCF)is crucial for water resource management in the Qilian Mountains(QLM),China.However,there is still a lack of adequate qua...Understanding the impact of meteorological and topographical factors on snow cover fraction(SCF)is crucial for water resource management in the Qilian Mountains(QLM),China.However,there is still a lack of adequate quantitative analysis of the impact of these factors.This study investigated the spatiotemporal characteristics and trends of SCF in the QLM based on the cloud-removed Moderate Resolution Imaging Spectroradiometer(MODIS)SCF dataset during 2000-2021 and conducted a quantitative analysis of the drivers using a histogram-based gradient boosting regression tree(HGBRT)model.The results indicated that the monthly distribution of SCF exhibited a bimodal pattern.The SCF showed a pattern of higher values in the western regions and lower values in the eastern regions.Overall,the SCF showed a decreasing trend during 2000-2021.The decrease in SCF occurred at higher elevations,while an increase was observed at lower elevations.At the annual scale,the SCF showed a downward trend in the western regions affected by westerly(52.84%of the QLM).However,the opposite trend was observed in the eastern regions affected by monsoon(45.73%of the QLM).The SCF displayed broadly similar spatial patterns in autumn and winter,with a significant decrease in the western regions and a slight increase in the central and eastern regions.The effect of spring SCF on spring surface runoff was more pronounced than that of winter SCF.Furthermore,compared with meteorological factors,a variation of 46.53%in spring surface runoff can be attributed to changes in spring SCF.At the annual scale,temperature and relative humidity were the most important drivers of SCF change.An increase in temperature exceeding 0.04°C/a was observed to result in a decline in SCF,with a maximum decrease of 0.22%/a.An increase in relative humidity of more than 0.02%/a stabilized the rise in SCF(about 0.06%/a).The impacts of slope and aspect were found to be minimal.At the seasonal scale,the primary factors impacting SCF change varied.In spring,precipitation and wind speed emerged as the primary drivers.In autumn,precipitation and temperature were identified as the primary drivers.In winter,relative humidity and precipitation were the most important drivers.In contrast to the other seasons,slope exerted the strongest influence on SCF change in summer.This study facilitates a detailed quantitative description of SCF change in the QLM,enhancing the effectiveness of watershed water resource management and ecological conservation efforts in this region.展开更多
Machine learning(ML)has become a powerful tool for accelerating the design and development of new materials.Among various traditional ML algorithms,decision tree-based ensemble learning methods are frequently chosen f...Machine learning(ML)has become a powerful tool for accelerating the design and development of new materials.Among various traditional ML algorithms,decision tree-based ensemble learning methods are frequently chosen for their strong predictive capabilities.However,decision trees are limited in regression tasks to interpolating within the data range of the training set,which restricts their usefulness for designing materials with enhanced properties.Herein,we focused on predicting and optimizing the L1_(2)-phase solvus temperature(T_(L12))and density,two critical properties for multi-principal-element superalloys(MPESAs).To achieve this,we employed the piecewise symbolic regression tree(PS-Tree),which demonstrates excellent extrapolation capability.Our model successfully predicted high T_(L12)values exceeding the training data range(1242℃),with four candidate alloys achieving TL12values of 1246,1249,1254,and 1274℃.Experimental validation confirmed the accuracy of these predictions,verifying the robust extrapolative capability of the PS-Tree method.Notably,one alloy exhibited a T_(L12)of 1267℃and a density of 7.94 g cm^(-3),outperforming most MPESAs.Additionally,another alloy exhibited a compressive yield strength of 897 MPa at 750℃,with a specific yield strength at this temperature higher than that of most L1_(2)-strengthened alloys and Co/Ni-based superalloys.Moreover,the model provided generalized insights,indicating that alloys with δ_(r)>5.3 and ΔH_(mix)<-12.8 J mol^(-1)K^(-1)tend to favor higher T_(L12).展开更多
Understanding the influencing factors of ecosystem services(ESs)and their relationships is essential for sustainable ecosystem management in degraded alpine ecosystems.There is a lack of integrated multi-model approac...Understanding the influencing factors of ecosystem services(ESs)and their relationships is essential for sustainable ecosystem management in degraded alpine ecosystems.There is a lack of integrated multi-model approaches to explore the multidimensional influences on ESs and their relationships in alpine ecosystems.Taking the Daxing'anling forest area,Inner Mongolia(DFAIM)as a case study,this study used the integrated valuation of ecosystem services and trade-offs(InVEST)model to quantify four ESs—soil conservation(SC),water yield(WY),carbon storage(CS),and habitat quality(HQ)—from 2013 to 2018.We adopted root mean square deviation(RMSD)and coupling coordination degree models(CCDM)to analyze their relationships,and integrated three complementary approaches—optimal parameter-based geographical detector model(OPGDM),gradient boosting regression tree model(GBRTM),and quantile regression model(QRM)—to reveal multidimensional influencing factors.Key findings include the following:(1)From 2013 to 2018,WY,SC,and HQ declined while CS increased.WY was primarily influenced by mean annual precipitation(MAP),forest ratio(RF),and soil bulk density(SBD);CS and HQ by RF and population density(PD);and SC by slope(S),RF,and MAP.Mean annual temperature(MAT),gross domestic product(GDP),and road network density(RND)showed increasing negative impacts.(2)Low trade-off intensity(TI<0.15)dominated all ES pairs,with RF,MAP,PD,and normalized difference vegetation index(NDVI)being the dominant factors.The factor interactions primarily showed two-factor enhancement patterns.(3)The average coupling coordination degree(CCD)of the four ESs was low and declined over time,with low-CCD areas becoming increasingly prevalent.RF,S,SBD,and NDVI positively influenced CCD,while PD,MAT,GDP,and RND had increasing negative impacts,with over 62%of the factor interactions exceeding the individual factor effects.In summary,ES supply generally decreased.Local relationships showed moderate coordination,while overall relationships indicated primary dysfunction.Land use and natural factors primarily shaped these ES and their relationships,while climate and socioeconomic changes diminished ES supply and intensified competition.We recommend enhancing the resilience of natural systems rather than replacing them,establishing climate adaptation monitoring systems,and promoting conservation tillage and cross-departmental coordination mechanisms for collaborative ES optimization.These results provide valuable insights into the sustainable management of alpine ecosystems.展开更多
Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence...Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.展开更多
Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep...Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep subsurface to shallow aquifers.The stray gas can dissolve in groundwater leading to chemical and biological reactions,which could negatively affect groundwater quality and contribute to atmospheric emissions.The knowledge oflight hydrocarbon solubility in the aqueous environment is essential for the numerical modelling offlow and transport in the subsurface.Herein,we compiled a database containing 2129experimental data of methane,ethane,and propane solubility in pure water and various electrolyte solutions over wide ranges of operating temperature and pressure.Two machine learning algorithms,namely regression tree(RT)and boosted regression tree(BRT)tuned with a Bayesian optimization algorithm(BO)were employed to determine the solubility of gases.The predictions were compared with the experimental data as well as four well-established thermodynamic models.Our analysis shows that the BRT-BO is sufficiently accurate,and the predicted values agree well with those obtained from the thermodynamic models.The coefficient of determination(R2)between experimental and predicted values is 0.99 and the mean squared error(MSE)is 9.97×10^(-8).The leverage statistical approach further confirmed the validity of the model developed.展开更多
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
基金The National Natural Science Foundation of China(No.51708110)。
文摘In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.
文摘The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.
基金financially supported by National Key R&D Program of China(2021YFD220040403 and 2021YFD220040304)the China Scholarship Council(202107565021).
文摘Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.
文摘Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-tree method and propose the segmented linear regression trees(SLRT) model that replaces the traditional constant leaf model with linear ones. From the parametric view, SLRT can be employed as a recursive change point detect procedure for segmented linear regression(SLR) models,which is much more efficient and flexible than the traditional grid search method. Along this way,we propose to use the conditional Kendall's τ correlation coefficient to select the underlying change points. From the non-parametric view, we propose an efficient greedy splitting method that selects the splits by analyzing the association between residuals and each candidate split variable. Further, with the SLRT as a single-tree predictor, we propose a linear random forest approach that aggregates the SLRTs by a weighted average. Both simulation and empirical studies showed significant improvements than the CART trees and even the random forest.
文摘The shear strength parameters of soil (cohesion and angle of internal friction) are quite essential in solving many civil engineering problems. In order to determine these parameters, laboratory tests are used. The main objective of this work is to evaluate the potential of Artificial Neural Network (ANN) and Regression Tree (CART) techniques for the indirect estimation of these parameters. Four different models, considering different combinations of 6 inputs, such as gravel %, sand %, silt %, clay %, dry density, and plasticity index, were investigated to evaluate the degree of their effects on the prediction of shear parameters. A performance evaluation was carried out using Correlation Coefficient and Root Mean Squared Error measures. It was observed that for the prediction of friction angle, the performance of both the techniques is about the same. However, for the prediction of cohesion, the ANN technique performs better than the CART technique. It was further observed that the model considering all of the 6 input soil parameters is the most appropriate model for the prediction of shear parameters. Also, connection weight and bias analyses of the best neural network (i.e., 6/2/2) were attempted using Connec- tion Weight, Garson, and proposed Weight-bias approaches to characterize the influence of input variables on shear strength parameters. It was observed that the Connection Weight Approach provides the best overall methodology for accurately quantifying variable importance, and should be favored over the other approaches examined in this study.
基金supported by the National Natural Science Foundation of China(Grant No.12071173 and 12171192)Huaian Key Laboratory for Infectious Diseases Control and Prevention(HAP201704).
文摘Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.
文摘Water stored in reservoirs has a lot of crucial function,including generating hydropower,supporting water supply,and relieving lasting droughts.During floods,water deliveries from reservoirs must be acceptable,so as to guarantee that the gross volume of water is at a safe level and any release from reservoirs will not trigger flooding downstream.This study aims to develop a well-versed assessment method for managing reservoirs and pre-releasing water outflows by using the machine learning technology.As a new and exciting AI area,this technology is regarded as the most valuable,time-saving,supervised and cost-effective approach.In this study,two data-driven forecasting models,i.e.,Regression Tree(RT)and Support Vector Machine(SVM),were employed for approximately 30 years’hydrological records,so as to simulate reservoir outflows.The SVM and RT models were applied to the data,accurately predicting the fluctuations in the water outflows of a Bhakra reservoir.Different input combinations were used to determine the most effective release.For cross-validation,the number of folds varied.It is found that quadratic SVM for 10 folds with seven different parameters would give the minimum RMSE,maximum R2,and minimum MAE;therefore,it can be considered as the best model for the dataset used in this study.
文摘Bayesian Additive Regression Trees(BART)is a widely popular nonparametric regression model known for its accurate prediction capabilities.In certain situations,there is knowledge suggesting the existence of certain dominant variables.However,the BART model fails to fully utilize the knowledge.To tackle this problem,the paper introduces a modification to BART known as the Partially Fixed BART model.By fixing a portion of the trees’structure,this model enables more efficient utilization of prior knowledge,resulting in enhanced estimation accuracy.Moreover,the Partially Fixed BART model can offer more precise estimates and valuable insights for future analysis even when such prior knowledge is absent.Empirical results substantiate the enhancement of the proposed model in comparison to the original BART.
文摘The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations traversing shallow waters,where ice-berg keels may reach the seabed,potentially damaging subsea structures.Consequently,costly and time-intensive iceberg manage-ment operations,such as towing and rerouting,are undertaken to safeguard subsea and offshore infrastructure.This study,therefore,explores the application of extra tree regression(ETR)as a robust solution for estimating iceberg draft,particularly in the preliminary phases of decision-making for iceberg management projects.Nine ETR models were developed using parameters influencing iceberg draft.Subsequent analyses identified the most effective models and significant input variables.Uncertainty analysis revealed that the superior ETR model tended to overestimate iceberg drafts;however,it achieved the highest precision,correlation,and simplicity in estimation.Comparison with decision tree regression,random forest regression,and empirical methods confirmed the superior perfor-mance of ETR in predicting iceberg drafts.
基金National Natural Science Foundation of China(No.61163010)
文摘This paper presents a supervised learning algorithm for retinal vascular segmentation based on classification and regression tree (CART) algorithm and improved adptive bosting (AdaBoost). Local binary patterns (LBP) texture features and local features are extracted by extracting,reversing,dilating and enhancing the green components of retinal images to construct a 17-dimensional feature vector. A dataset is constructed by using the feature vector and the data manually marked by the experts. The feature is used to generate CART binary tree for nodes,where CART binary tree is as the AdaBoost weak classifier,and AdaBoost is improved by adding some re-judgment functions to form a strong classifier. The proposed algorithm is simulated on the digital retinal images for vessel extraction (DRIVE). The experimental results show that the proposed algorithm has higher segmentation accuracy for blood vessels,and the result basically contains complete blood vessel details. Moreover,the segmented blood vessel tree has good connectivity,which basically reflects the distribution trend of blood vessels. Compared with the traditional AdaBoost classification algorithm and the support vector machine (SVM) based classification algorithm,the proposed algorithm has higher average accuracy and reliability index,which is similar to the segmentation results of the state-of-the-art segmentation algorithm.
文摘Determining the causal effect of special education is a critical topic when mak-ing educational policy that focuses on student achievement.However,current special education research is facing challenges from persistent selection bias and complex confounding.Bayesian Additive Regression Trees(BART)is em-ployed in this study to provide a flexible estimation of the academic perfor-mance.Targeted Maximum Likelihood Estimation(TMLE)is also integrated into the BART model,supporting doubly robust estimation of the special ed-ucation effect.This study extracted survey data from the Early Childhood Lon-gitudinal Study,Kindergarten Class(ECLS-K),to estimate the causal impact of special education status on students’combined mathematics and reading achievement scores.The analysis results of the BART-TMLE model show that children receiving special education services demonstrated approximately 9 points lower scores on average for combined math and reading scores,even adjusting for a considerable number of covariates,compared to their peers who did not receive these services.The estimated negative treatment effect persists after controlling for observed covariates that are closely correlated to the combined test score.The negative effect likely reflects unobserved factors,such as the underlying severity of learning disabilities,parent involvement and other potential traits,which are actual factors that determine the placement of special education status,rather than indicating the ineffectiveness of special education service.The achievement gap in academic performance reflects the current observable status of special education.The estimated effect could be improved by future research incorporating educational domain knowledge,allowing the model to be constructed more accurately.
文摘Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification. In this paper, a new method of using a tree regression to improve logistic classification model is introduced in biomarker data analysis. The numerical results show that the linear logistic model can be significantly improved by a tree regression on the residuals. Although the classification problem of binary responses is discussed in this research, the idea is easy to extend to the classification of multinomial responses.
基金funded by the Key Research and Development Project for Ecological Civilization Construction in Gansu Province(24YFFA010)the Gansu Province Major Science and Technology Project(22ZD6FA005)+2 种基金the Natural Science Foundation of Gansu Province(24JRRA091)the Shanxi Province Basic Research Program(Free Exploration Category)Youth Project(202403021212316)the Science and Technology Innovation Program for Universities in Shanxi Province(2024L327)。
文摘Understanding the impact of meteorological and topographical factors on snow cover fraction(SCF)is crucial for water resource management in the Qilian Mountains(QLM),China.However,there is still a lack of adequate quantitative analysis of the impact of these factors.This study investigated the spatiotemporal characteristics and trends of SCF in the QLM based on the cloud-removed Moderate Resolution Imaging Spectroradiometer(MODIS)SCF dataset during 2000-2021 and conducted a quantitative analysis of the drivers using a histogram-based gradient boosting regression tree(HGBRT)model.The results indicated that the monthly distribution of SCF exhibited a bimodal pattern.The SCF showed a pattern of higher values in the western regions and lower values in the eastern regions.Overall,the SCF showed a decreasing trend during 2000-2021.The decrease in SCF occurred at higher elevations,while an increase was observed at lower elevations.At the annual scale,the SCF showed a downward trend in the western regions affected by westerly(52.84%of the QLM).However,the opposite trend was observed in the eastern regions affected by monsoon(45.73%of the QLM).The SCF displayed broadly similar spatial patterns in autumn and winter,with a significant decrease in the western regions and a slight increase in the central and eastern regions.The effect of spring SCF on spring surface runoff was more pronounced than that of winter SCF.Furthermore,compared with meteorological factors,a variation of 46.53%in spring surface runoff can be attributed to changes in spring SCF.At the annual scale,temperature and relative humidity were the most important drivers of SCF change.An increase in temperature exceeding 0.04°C/a was observed to result in a decline in SCF,with a maximum decrease of 0.22%/a.An increase in relative humidity of more than 0.02%/a stabilized the rise in SCF(about 0.06%/a).The impacts of slope and aspect were found to be minimal.At the seasonal scale,the primary factors impacting SCF change varied.In spring,precipitation and wind speed emerged as the primary drivers.In autumn,precipitation and temperature were identified as the primary drivers.In winter,relative humidity and precipitation were the most important drivers.In contrast to the other seasons,slope exerted the strongest influence on SCF change in summer.This study facilitates a detailed quantitative description of SCF change in the QLM,enhancing the effectiveness of watershed water resource management and ecological conservation efforts in this region.
基金financially supported by the National Natural Science Foundation of China(Nos.52371007 and 52301042)the National Key R&D Program of China(No.2020YFB0704503)+2 种基金Shenzhen Science and Technology Program(No.SGDX20210823104002016)Guangdong Basic and Applied Basic Research Foundation(No.2021B1515120071)Shenzhen Basic Research Project(No.JCYJ20241202123504007)
文摘Machine learning(ML)has become a powerful tool for accelerating the design and development of new materials.Among various traditional ML algorithms,decision tree-based ensemble learning methods are frequently chosen for their strong predictive capabilities.However,decision trees are limited in regression tasks to interpolating within the data range of the training set,which restricts their usefulness for designing materials with enhanced properties.Herein,we focused on predicting and optimizing the L1_(2)-phase solvus temperature(T_(L12))and density,two critical properties for multi-principal-element superalloys(MPESAs).To achieve this,we employed the piecewise symbolic regression tree(PS-Tree),which demonstrates excellent extrapolation capability.Our model successfully predicted high T_(L12)values exceeding the training data range(1242℃),with four candidate alloys achieving TL12values of 1246,1249,1254,and 1274℃.Experimental validation confirmed the accuracy of these predictions,verifying the robust extrapolative capability of the PS-Tree method.Notably,one alloy exhibited a T_(L12)of 1267℃and a density of 7.94 g cm^(-3),outperforming most MPESAs.Additionally,another alloy exhibited a compressive yield strength of 897 MPa at 750℃,with a specific yield strength at this temperature higher than that of most L1_(2)-strengthened alloys and Co/Ni-based superalloys.Moreover,the model provided generalized insights,indicating that alloys with δ_(r)>5.3 and ΔH_(mix)<-12.8 J mol^(-1)K^(-1)tend to favor higher T_(L12).
基金funded primarily by the Central Public Welfare Research Institutes Basic Research Business Funds to Support the Administration’s Central Work Project(Grant No.CAFYBB2023ZA003-4)the National Natural Science Foundation of China(Grant Nos.31170593 and 31570633)National Forestry and Grassland Administration Forestry Under the Project“Forestry Major Issues Research”(Grant Nos.500102-1776 and 500102-5110).
文摘Understanding the influencing factors of ecosystem services(ESs)and their relationships is essential for sustainable ecosystem management in degraded alpine ecosystems.There is a lack of integrated multi-model approaches to explore the multidimensional influences on ESs and their relationships in alpine ecosystems.Taking the Daxing'anling forest area,Inner Mongolia(DFAIM)as a case study,this study used the integrated valuation of ecosystem services and trade-offs(InVEST)model to quantify four ESs—soil conservation(SC),water yield(WY),carbon storage(CS),and habitat quality(HQ)—from 2013 to 2018.We adopted root mean square deviation(RMSD)and coupling coordination degree models(CCDM)to analyze their relationships,and integrated three complementary approaches—optimal parameter-based geographical detector model(OPGDM),gradient boosting regression tree model(GBRTM),and quantile regression model(QRM)—to reveal multidimensional influencing factors.Key findings include the following:(1)From 2013 to 2018,WY,SC,and HQ declined while CS increased.WY was primarily influenced by mean annual precipitation(MAP),forest ratio(RF),and soil bulk density(SBD);CS and HQ by RF and population density(PD);and SC by slope(S),RF,and MAP.Mean annual temperature(MAT),gross domestic product(GDP),and road network density(RND)showed increasing negative impacts.(2)Low trade-off intensity(TI<0.15)dominated all ES pairs,with RF,MAP,PD,and normalized difference vegetation index(NDVI)being the dominant factors.The factor interactions primarily showed two-factor enhancement patterns.(3)The average coupling coordination degree(CCD)of the four ESs was low and declined over time,with low-CCD areas becoming increasingly prevalent.RF,S,SBD,and NDVI positively influenced CCD,while PD,MAT,GDP,and RND had increasing negative impacts,with over 62%of the factor interactions exceeding the individual factor effects.In summary,ES supply generally decreased.Local relationships showed moderate coordination,while overall relationships indicated primary dysfunction.Land use and natural factors primarily shaped these ES and their relationships,while climate and socioeconomic changes diminished ES supply and intensified competition.We recommend enhancing the resilience of natural systems rather than replacing them,establishing climate adaptation monitoring systems,and promoting conservation tillage and cross-departmental coordination mechanisms for collaborative ES optimization.These results provide valuable insights into the sustainable management of alpine ecosystems.
基金supported by the National Natural Science Foundation of China (Nos. 41501229, 41371224, 41130530, and 91325301)the China Postdoctoral Science Foundation (No. 2015M581876)
文摘Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.
文摘Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep subsurface to shallow aquifers.The stray gas can dissolve in groundwater leading to chemical and biological reactions,which could negatively affect groundwater quality and contribute to atmospheric emissions.The knowledge oflight hydrocarbon solubility in the aqueous environment is essential for the numerical modelling offlow and transport in the subsurface.Herein,we compiled a database containing 2129experimental data of methane,ethane,and propane solubility in pure water and various electrolyte solutions over wide ranges of operating temperature and pressure.Two machine learning algorithms,namely regression tree(RT)and boosted regression tree(BRT)tuned with a Bayesian optimization algorithm(BO)were employed to determine the solubility of gases.The predictions were compared with the experimental data as well as four well-established thermodynamic models.Our analysis shows that the BRT-BO is sufficiently accurate,and the predicted values agree well with those obtained from the thermodynamic models.The coefficient of determination(R2)between experimental and predicted values is 0.99 and the mean squared error(MSE)is 9.97×10^(-8).The leverage statistical approach further confirmed the validity of the model developed.