According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model u...In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.展开更多
The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more ...The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.展开更多
Urban grid power forecasting is one of the important tasks of power system operators, which helps to analyze the development trend of the city. As the demand for electricity in various industries is affected by many f...Urban grid power forecasting is one of the important tasks of power system operators, which helps to analyze the development trend of the city. As the demand for electricity in various industries is affected by many factors, the data of relevant influencing factors are scarce, resulting in great deviations in the accuracy of prediction results. In order to improve the prediction results, this paper proposes a model based on Multi-Target Tree Regression to predict the monthly electricity consumption of different industrial structures. Due to few data characteristics of actual electricity consumption in Shanghai from 2013 to the first half of 2017. Thus, we collect data on GDP growth, weather conditions, and tourism season distribution in various industries in Shanghai, model and train the electricity consumption data of different industries in different months. The multi-target tree regression model was tested with actual values to verify the reliability of the model and predict the monthly electricity consumption of each industry in the second half of 2017. The experimental results show that the model can accurately predict the monthly electricity consumption of various industries.展开更多
Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.It has received relatively small attention from the Machine Learni...Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.It has received relatively small attention from the Machine Learning community.However,multi-target regression exists in many real-world applications.In this paper we conduct extensive experiments to investigate the performance of three representative multi-target regression learning algorithms(i.e.Multi-Target Stacking(MTS),Random Linear Target Combination(RLTC),and Multi-Objective Random Forest(MORF)),comparing the baseline single-target learning.Our experimental results show that all three multi-target regression learning algorithms do improve the performance of the single-target learning.Among them,MTS performs the best,followed by RLTC,followed by MORF.However,the single-target learning sometimes still performs very well,even the best.This analysis sheds the light on multi-target regression learning and indicates that the single-target learning is a competitive baseline for multi-target regression learning on multi-target domains.展开更多
Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ ...Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.展开更多
Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species divers...Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.展开更多
Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Re...Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.展开更多
The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations...The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations traversing shallow waters,where ice-berg keels may reach the seabed,potentially damaging subsea structures.Consequently,costly and time-intensive iceberg manage-ment operations,such as towing and rerouting,are undertaken to safeguard subsea and offshore infrastructure.This study,therefore,explores the application of extra tree regression(ETR)as a robust solution for estimating iceberg draft,particularly in the preliminary phases of decision-making for iceberg management projects.Nine ETR models were developed using parameters influencing iceberg draft.Subsequent analyses identified the most effective models and significant input variables.Uncertainty analysis revealed that the superior ETR model tended to overestimate iceberg drafts;however,it achieved the highest precision,correlation,and simplicity in estimation.Comparison with decision tree regression,random forest regression,and empirical methods confirmed the superior perfor-mance of ETR in predicting iceberg drafts.展开更多
The feasibility of constructing shallow foundations on saturated sands remains uncertain.Seismic design standards simply stipulate that geotechnical investigations for a shallow foundation on such soils shall be condu...The feasibility of constructing shallow foundations on saturated sands remains uncertain.Seismic design standards simply stipulate that geotechnical investigations for a shallow foundation on such soils shall be conducted to mitigate the effects of the liquefaction hazard.This study investigates the seismic behavior of strip foundations on typical two-layered soil profiles-a natural loose sand layer supported by a dense sand layer.Coupled nonlinear dynamic analyses have been conducted to calculate response parameters,including seismic settlement,the acceleration response on the ground surface,and excess pore pressure beneath strip foundations.A novel liquefaction potential index(LPI_(footing)),based on excess pore pressure ratios across a given region of soil mass beneath footings is introduced to classify liquefaction severity into three distinct levels:minor,moderate,and severe.To validate the proposed LPI_(footing),the foundation settlement is evaluated for the different liquefaction potential classes.A classification tree model has been grown to predict liquefaction susceptibility,utilizing various input variables,including earthquake intensity on the ground surface,foundation pressure,sand permeability,and top layer thickness.Moreover,a nonlinear regression function has been established to map LPI_(footing) in relation to these input predictors.The models have been constructed using a substantial dataset comprising 13,824 excess pore pressure ratio time histories.The performance of the developed models has been examined using various methods,including the 10-fold cross-validation method.The predictive capability of the tree also has been validated through existing experimental studies.The results indicate that the classification tree is not only interpretable but also highly predictive,with a testing accuracy level of 78.1%.The decision tree provides valuable insights for engineers assessing liquefaction potential beneath strip foundations.展开更多
A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities...A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities points simultaneously. It is also easy to be used to solve the continuous function's regression problems. The numerical experiment results demonstrate that the point-tree GP is an efficient alternative way to the complex function identification problems.展开更多
Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-...Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-tree method and propose the segmented linear regression trees(SLRT) model that replaces the traditional constant leaf model with linear ones. From the parametric view, SLRT can be employed as a recursive change point detect procedure for segmented linear regression(SLR) models,which is much more efficient and flexible than the traditional grid search method. Along this way,we propose to use the conditional Kendall's τ correlation coefficient to select the underlying change points. From the non-parametric view, we propose an efficient greedy splitting method that selects the splits by analyzing the association between residuals and each candidate split variable. Further, with the SLRT as a single-tree predictor, we propose a linear random forest approach that aggregates the SLRTs by a weighted average. Both simulation and empirical studies showed significant improvements than the CART trees and even the random forest.展开更多
The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth res...The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth response, tree age, pruning severity and pretreatment crown size. First, multiple regression analysis was performed to assess the effect of tree age, pruning severity and pretreatment crown size on diameter growth response. Next, segmented regression analysis was performed to assess the effect of pruning severity on diameter growth response. The results of the multiple regression showed that diameter growth response was significantly influenced by pruning severity and pretreatment crown size. The results of the segmented regression showed that in the whole data set, an abrupt change toward a decrease in diameter growth response was detected at 25% of the live crown removed. However, in the group of fully crowned and open-grown, diameter growth response continuously decreased with increasing pruning severity with no significant abrupt change, whereas in the group of 70% - 90% live crown, diameter growth response did not significantly decrease up to the break point (53% crown removed) and then abruptly decreased. This may be the first study to show the numerical evaluation of the effect of pruning severity on tree growth by change point analysis.展开更多
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
基金The National Natural Science Foundation of China(No.51708110)。
文摘In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.
文摘The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.
文摘Urban grid power forecasting is one of the important tasks of power system operators, which helps to analyze the development trend of the city. As the demand for electricity in various industries is affected by many factors, the data of relevant influencing factors are scarce, resulting in great deviations in the accuracy of prediction results. In order to improve the prediction results, this paper proposes a model based on Multi-Target Tree Regression to predict the monthly electricity consumption of different industrial structures. Due to few data characteristics of actual electricity consumption in Shanghai from 2013 to the first half of 2017. Thus, we collect data on GDP growth, weather conditions, and tourism season distribution in various industries in Shanghai, model and train the electricity consumption data of different industries in different months. The multi-target tree regression model was tested with actual values to verify the reliability of the model and predict the monthly electricity consumption of each industry in the second half of 2017. The experimental results show that the model can accurately predict the monthly electricity consumption of various industries.
基金This research has been supported by the US National Science Foundation under grant IIS-1115417the National Natural Science Foundation of China under grant 61728205,61472267and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant SZS201609。
文摘Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.It has received relatively small attention from the Machine Learning community.However,multi-target regression exists in many real-world applications.In this paper we conduct extensive experiments to investigate the performance of three representative multi-target regression learning algorithms(i.e.Multi-Target Stacking(MTS),Random Linear Target Combination(RLTC),and Multi-Objective Random Forest(MORF)),comparing the baseline single-target learning.Our experimental results show that all three multi-target regression learning algorithms do improve the performance of the single-target learning.Among them,MTS performs the best,followed by RLTC,followed by MORF.However,the single-target learning sometimes still performs very well,even the best.This analysis sheds the light on multi-target regression learning and indicates that the single-target learning is a competitive baseline for multi-target regression learning on multi-target domains.
文摘Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.
基金financially supported by National Key R&D Program of China(2021YFD220040403 and 2021YFD220040304)the China Scholarship Council(202107565021).
文摘Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.
文摘Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.
文摘The Arctic region is experiencing accelerated sea ice melt and increased iceberg detachment from glaciers due to climate change.These drifting icebergs present a risk and engineering challenge for subsea installations traversing shallow waters,where ice-berg keels may reach the seabed,potentially damaging subsea structures.Consequently,costly and time-intensive iceberg manage-ment operations,such as towing and rerouting,are undertaken to safeguard subsea and offshore infrastructure.This study,therefore,explores the application of extra tree regression(ETR)as a robust solution for estimating iceberg draft,particularly in the preliminary phases of decision-making for iceberg management projects.Nine ETR models were developed using parameters influencing iceberg draft.Subsequent analyses identified the most effective models and significant input variables.Uncertainty analysis revealed that the superior ETR model tended to overestimate iceberg drafts;however,it achieved the highest precision,correlation,and simplicity in estimation.Comparison with decision tree regression,random forest regression,and empirical methods confirmed the superior perfor-mance of ETR in predicting iceberg drafts.
文摘The feasibility of constructing shallow foundations on saturated sands remains uncertain.Seismic design standards simply stipulate that geotechnical investigations for a shallow foundation on such soils shall be conducted to mitigate the effects of the liquefaction hazard.This study investigates the seismic behavior of strip foundations on typical two-layered soil profiles-a natural loose sand layer supported by a dense sand layer.Coupled nonlinear dynamic analyses have been conducted to calculate response parameters,including seismic settlement,the acceleration response on the ground surface,and excess pore pressure beneath strip foundations.A novel liquefaction potential index(LPI_(footing)),based on excess pore pressure ratios across a given region of soil mass beneath footings is introduced to classify liquefaction severity into three distinct levels:minor,moderate,and severe.To validate the proposed LPI_(footing),the foundation settlement is evaluated for the different liquefaction potential classes.A classification tree model has been grown to predict liquefaction susceptibility,utilizing various input variables,including earthquake intensity on the ground surface,foundation pressure,sand permeability,and top layer thickness.Moreover,a nonlinear regression function has been established to map LPI_(footing) in relation to these input predictors.The models have been constructed using a substantial dataset comprising 13,824 excess pore pressure ratio time histories.The performance of the developed models has been examined using various methods,including the 10-fold cross-validation method.The predictive capability of the tree also has been validated through existing experimental studies.The results indicate that the classification tree is not only interpretable but also highly predictive,with a testing accuracy level of 78.1%.The decision tree provides valuable insights for engineers assessing liquefaction potential beneath strip foundations.
基金Supported by the National Natural Science Foundation(60173046)and the Natural Science Foundation of Province(2002AB040)
文摘A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities points simultaneously. It is also easy to be used to solve the continuous function's regression problems. The numerical experiment results demonstrate that the point-tree GP is an efficient alternative way to the complex function identification problems.
文摘Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-tree method and propose the segmented linear regression trees(SLRT) model that replaces the traditional constant leaf model with linear ones. From the parametric view, SLRT can be employed as a recursive change point detect procedure for segmented linear regression(SLR) models,which is much more efficient and flexible than the traditional grid search method. Along this way,we propose to use the conditional Kendall's τ correlation coefficient to select the underlying change points. From the non-parametric view, we propose an efficient greedy splitting method that selects the splits by analyzing the association between residuals and each candidate split variable. Further, with the SLRT as a single-tree predictor, we propose a linear random forest approach that aggregates the SLRTs by a weighted average. Both simulation and empirical studies showed significant improvements than the CART trees and even the random forest.
文摘The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth response, tree age, pruning severity and pretreatment crown size. First, multiple regression analysis was performed to assess the effect of tree age, pruning severity and pretreatment crown size on diameter growth response. Next, segmented regression analysis was performed to assess the effect of pruning severity on diameter growth response. The results of the multiple regression showed that diameter growth response was significantly influenced by pruning severity and pretreatment crown size. The results of the segmented regression showed that in the whole data set, an abrupt change toward a decrease in diameter growth response was detected at 25% of the live crown removed. However, in the group of fully crowned and open-grown, diameter growth response continuously decreased with increasing pruning severity with no significant abrupt change, whereas in the group of 70% - 90% live crown, diameter growth response did not significantly decrease up to the break point (53% crown removed) and then abruptly decreased. This may be the first study to show the numerical evaluation of the effect of pruning severity on tree growth by change point analysis.