The feasibility of constructing shallow foundations on saturated sands remains uncertain.Seismic design standards simply stipulate that geotechnical investigations for a shallow foundation on such soils shall be condu...The feasibility of constructing shallow foundations on saturated sands remains uncertain.Seismic design standards simply stipulate that geotechnical investigations for a shallow foundation on such soils shall be conducted to mitigate the effects of the liquefaction hazard.This study investigates the seismic behavior of strip foundations on typical two-layered soil profiles-a natural loose sand layer supported by a dense sand layer.Coupled nonlinear dynamic analyses have been conducted to calculate response parameters,including seismic settlement,the acceleration response on the ground surface,and excess pore pressure beneath strip foundations.A novel liquefaction potential index(LPI_(footing)),based on excess pore pressure ratios across a given region of soil mass beneath footings is introduced to classify liquefaction severity into three distinct levels:minor,moderate,and severe.To validate the proposed LPI_(footing),the foundation settlement is evaluated for the different liquefaction potential classes.A classification tree model has been grown to predict liquefaction susceptibility,utilizing various input variables,including earthquake intensity on the ground surface,foundation pressure,sand permeability,and top layer thickness.Moreover,a nonlinear regression function has been established to map LPI_(footing) in relation to these input predictors.The models have been constructed using a substantial dataset comprising 13,824 excess pore pressure ratio time histories.The performance of the developed models has been examined using various methods,including the 10-fold cross-validation method.The predictive capability of the tree also has been validated through existing experimental studies.The results indicate that the classification tree is not only interpretable but also highly predictive,with a testing accuracy level of 78.1%.The decision tree provides valuable insights for engineers assessing liquefaction potential beneath strip foundations.展开更多
针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取...针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。展开更多
The main objective of this research is to determine the capacity of land cover classification combining spec- tral and textural features of Landsat TM imagery with ancillary geographical data in wetlands of the Sanjia...The main objective of this research is to determine the capacity of land cover classification combining spec- tral and textural features of Landsat TM imagery with ancillary geographical data in wetlands of the Sanjiang Plain, Heilongjiang Province, China. Semi-variograms and Z-test value were calculated to assess the separability of grey-level co-occurrence texture measures to maximize the difference between land cover types. The degree of spatial autocorrelation showed that window sizes of 3×3 pixels and 11×11 pixels were most appropriate for Landsat TM im- age texture calculations. The texture analysis showed that co-occurrence entropy, dissimilarity, and variance texture measures, derived from the Landsat TM spectrum bands and vegetation indices provided the most significant statistical differentiation between land cover types. Subsequently, a Classification and Regression Tree (CART) algorithm was applied to three different combinations of predictors: 1) TM imagery alone (TM-only); 2) TM imagery plus image texture (TM+TXT model); and 3) all predictors including TM imagery, image texture and additional ancillary GIS in- formation (TM+TXT+GIS model). Compared with traditional Maximum Likelihood Classification (MLC) supervised classification, three classification trees predictive models reduced the overall error rate significantly. Image texture measures and ancillary geographical variables depressed the speckle noise effectively and reduced classification error rate of marsh obviously. For classification trees model making use of all available predictors, omission error rate was 12.90% and commission error rate was 10.99% for marsh. The developed method is portable, relatively easy to im- plement and should be applicable in other settings and over larger extents.展开更多
Urban tree species provide various essential ecosystem services in cities,such as regulating urban temperatures,reducing noise,capturing carbon,and mitigating the urban heat island effect.The quality of these services...Urban tree species provide various essential ecosystem services in cities,such as regulating urban temperatures,reducing noise,capturing carbon,and mitigating the urban heat island effect.The quality of these services is influenced by species diversity,tree health,and the distribution and the composition of trees.Traditionally,data on urban trees has been collected through field surveys and manual interpretation of remote sensing images.In this study,we evaluated the effectiveness of multispectral airborne laser scanning(ALS)data in classifying 24 common urban roadside tree species in Espoo,Finland.Tree crown structure information,intensity features,and spectral data were used for classification.Eight different machine learning algorithms were tested,with the extra trees(ET)algorithm performing the best,achieving an overall accuracy of 71.7%using multispectral LiDAR data.This result highlights that integrating structural and spectral information within a single framework can improve the classification accuracy.Future research will focus on identifying the most important features for species classification and developing algorithms with greater efficiency and accuracy.展开更多
Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs...Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.展开更多
This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric feature...This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.展开更多
Flood disasters can have a serious impact on people's production and lives, and can cause hugelosses in lives and property security. Based on multi-source remote sensing data, this study establisheddecision tree c...Flood disasters can have a serious impact on people's production and lives, and can cause hugelosses in lives and property security. Based on multi-source remote sensing data, this study establisheddecision tree classification rules through multi-source and multi-temporal feature fusion, classified groundobjects before the disaster and extracted flood information in the disaster area based on optical imagesduring the disaster, so as to achieve rapid acquisition of the disaster situation of each disaster bearing object.In the case of Qianliang Lake, which suffered from flooding in 2020, the results show that decision treeclassification algorithms based on multi-temporal features can effectively integrate multi-temporal and multispectralinformation to overcome the shortcomings of single-temporal image classification and achieveground-truth object classification.展开更多
Leaf disease identification is one of the most promising applications of convolutional neural networks(CNNs).This method represents a significant step towards revolutionizing agriculture by enabling the quick and accu...Leaf disease identification is one of the most promising applications of convolutional neural networks(CNNs).This method represents a significant step towards revolutionizing agriculture by enabling the quick and accurate assessment of plant health.In this study,a CNN model was specifically designed and tested to detect and categorize diseases on fig tree leaves.The researchers utilized a dataset of 3422 images,divided into four classes:healthy,fig rust,fig mosaic,and anthracnose.These diseases can significantly reduce the yield and quality of fig tree fruit.The objective of this research is to develop a CNN that can identify and categorize diseases in fig tree leaves.The data for this study was collected from gardens in the Amandi and Mamash Khail Bannu districts of the Khyber Pakhtunkhwa region in Pakistan.To minimize the risk of overfitting and enhance the model’s performance,early stopping techniques and data augmentation were employed.As a result,the model achieved a training accuracy of 91.53%and a validation accuracy of 90.12%,which are considered respectable.This comprehensive model assists farmers in the early identification and categorization of fig tree leaf diseases.Our experts believe that CNNs could serve as valuable tools for accurate disease classification and detection in precision agriculture.We recommend further research to explore additional data sources and more advanced neural networks to improve the model’s accuracy and applicability.Future research will focus on expanding the dataset by including new diseases and testing the model in real-world scenarios to enhance sustainable farming practices.展开更多
Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing ...Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression(LR), Decision Tree(DT), Support Vector Machine(SVM),Random Forest(RF), and Gradient Boosting(GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree(DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and66.12% for testing. For validation, the Gradient Boosting(GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest(RF) and Gradient Boosting(GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
文摘The feasibility of constructing shallow foundations on saturated sands remains uncertain.Seismic design standards simply stipulate that geotechnical investigations for a shallow foundation on such soils shall be conducted to mitigate the effects of the liquefaction hazard.This study investigates the seismic behavior of strip foundations on typical two-layered soil profiles-a natural loose sand layer supported by a dense sand layer.Coupled nonlinear dynamic analyses have been conducted to calculate response parameters,including seismic settlement,the acceleration response on the ground surface,and excess pore pressure beneath strip foundations.A novel liquefaction potential index(LPI_(footing)),based on excess pore pressure ratios across a given region of soil mass beneath footings is introduced to classify liquefaction severity into three distinct levels:minor,moderate,and severe.To validate the proposed LPI_(footing),the foundation settlement is evaluated for the different liquefaction potential classes.A classification tree model has been grown to predict liquefaction susceptibility,utilizing various input variables,including earthquake intensity on the ground surface,foundation pressure,sand permeability,and top layer thickness.Moreover,a nonlinear regression function has been established to map LPI_(footing) in relation to these input predictors.The models have been constructed using a substantial dataset comprising 13,824 excess pore pressure ratio time histories.The performance of the developed models has been examined using various methods,including the 10-fold cross-validation method.The predictive capability of the tree also has been validated through existing experimental studies.The results indicate that the classification tree is not only interpretable but also highly predictive,with a testing accuracy level of 78.1%.The decision tree provides valuable insights for engineers assessing liquefaction potential beneath strip foundations.
文摘针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。
基金Under the auspices of National Natural Science Foundation of China (No. 40871188) National Key Technologies R&D Program of China (No. 2006BAD23B03)
文摘The main objective of this research is to determine the capacity of land cover classification combining spec- tral and textural features of Landsat TM imagery with ancillary geographical data in wetlands of the Sanjiang Plain, Heilongjiang Province, China. Semi-variograms and Z-test value were calculated to assess the separability of grey-level co-occurrence texture measures to maximize the difference between land cover types. The degree of spatial autocorrelation showed that window sizes of 3×3 pixels and 11×11 pixels were most appropriate for Landsat TM im- age texture calculations. The texture analysis showed that co-occurrence entropy, dissimilarity, and variance texture measures, derived from the Landsat TM spectrum bands and vegetation indices provided the most significant statistical differentiation between land cover types. Subsequently, a Classification and Regression Tree (CART) algorithm was applied to three different combinations of predictors: 1) TM imagery alone (TM-only); 2) TM imagery plus image texture (TM+TXT model); and 3) all predictors including TM imagery, image texture and additional ancillary GIS in- formation (TM+TXT+GIS model). Compared with traditional Maximum Likelihood Classification (MLC) supervised classification, three classification trees predictive models reduced the overall error rate significantly. Image texture measures and ancillary geographical variables depressed the speckle noise effectively and reduced classification error rate of marsh obviously. For classification trees model making use of all available predictors, omission error rate was 12.90% and commission error rate was 10.99% for marsh. The developed method is portable, relatively easy to im- plement and should be applicable in other settings and over larger extents.
文摘Urban tree species provide various essential ecosystem services in cities,such as regulating urban temperatures,reducing noise,capturing carbon,and mitigating the urban heat island effect.The quality of these services is influenced by species diversity,tree health,and the distribution and the composition of trees.Traditionally,data on urban trees has been collected through field surveys and manual interpretation of remote sensing images.In this study,we evaluated the effectiveness of multispectral airborne laser scanning(ALS)data in classifying 24 common urban roadside tree species in Espoo,Finland.Tree crown structure information,intensity features,and spectral data were used for classification.Eight different machine learning algorithms were tested,with the extra trees(ET)algorithm performing the best,achieving an overall accuracy of 71.7%using multispectral LiDAR data.This result highlights that integrating structural and spectral information within a single framework can improve the classification accuracy.Future research will focus on identifying the most important features for species classification and developing algorithms with greater efficiency and accuracy.
文摘Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.
文摘This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.
文摘Flood disasters can have a serious impact on people's production and lives, and can cause hugelosses in lives and property security. Based on multi-source remote sensing data, this study establisheddecision tree classification rules through multi-source and multi-temporal feature fusion, classified groundobjects before the disaster and extracted flood information in the disaster area based on optical imagesduring the disaster, so as to achieve rapid acquisition of the disaster situation of each disaster bearing object.In the case of Qianliang Lake, which suffered from flooding in 2020, the results show that decision treeclassification algorithms based on multi-temporal features can effectively integrate multi-temporal and multispectralinformation to overcome the shortcomings of single-temporal image classification and achieveground-truth object classification.
基金the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support(QU-APC-2025).
文摘Leaf disease identification is one of the most promising applications of convolutional neural networks(CNNs).This method represents a significant step towards revolutionizing agriculture by enabling the quick and accurate assessment of plant health.In this study,a CNN model was specifically designed and tested to detect and categorize diseases on fig tree leaves.The researchers utilized a dataset of 3422 images,divided into four classes:healthy,fig rust,fig mosaic,and anthracnose.These diseases can significantly reduce the yield and quality of fig tree fruit.The objective of this research is to develop a CNN that can identify and categorize diseases in fig tree leaves.The data for this study was collected from gardens in the Amandi and Mamash Khail Bannu districts of the Khyber Pakhtunkhwa region in Pakistan.To minimize the risk of overfitting and enhance the model’s performance,early stopping techniques and data augmentation were employed.As a result,the model achieved a training accuracy of 91.53%and a validation accuracy of 90.12%,which are considered respectable.This comprehensive model assists farmers in the early identification and categorization of fig tree leaf diseases.Our experts believe that CNNs could serve as valuable tools for accurate disease classification and detection in precision agriculture.We recommend further research to explore additional data sources and more advanced neural networks to improve the model’s accuracy and applicability.Future research will focus on expanding the dataset by including new diseases and testing the model in real-world scenarios to enhance sustainable farming practices.
文摘Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression(LR), Decision Tree(DT), Support Vector Machine(SVM),Random Forest(RF), and Gradient Boosting(GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree(DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and66.12% for testing. For validation, the Gradient Boosting(GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest(RF) and Gradient Boosting(GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.