Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs...Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.展开更多
To solve the multi-class fault diagnosis tasks,decision tree support vector machine(DTSVM),which combines SVM and decision tree using the concept of dichotomy,is proposed.Since the classification performance of DTSVM ...To solve the multi-class fault diagnosis tasks,decision tree support vector machine(DTSVM),which combines SVM and decision tree using the concept of dichotomy,is proposed.Since the classification performance of DTSVM highly depends on its structure,to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes,genetic algorithm is introduced into the formation of decision tree,so that the most separable classes would be separated at each node of decisions tree.Numerical simulations conducted on three datasets compared with"one-against-all"and"one-against-one"demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.展开更多
The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects...The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.展开更多
This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric feature...This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.展开更多
Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and uns...Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.展开更多
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo...Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.展开更多
Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study pres...Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study presents a machine learning approach based on the C5.0 decision tree(DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map. Yanchang County, a typical landslide-prone area located in northwestern China, was taken as the area of interest to introduce the proposed application procedure. A landslide inventory containing 82 landslides was prepared and subsequently randomly partitioned into two subsets: training data(70% landslide pixels) and validation data(30% landslide pixels). Fourteen landslide influencing factors were considered in the input dataset and were used to calculate the landslide occurrence probability based on the C5.0 decision tree model.Susceptibility zonation was implemented according to the cut-off values calculated by the K-means cluster algorithm. The validation results of the model performance analysis showed that the AUC(area under the receiver operating characteristic(ROC) curve) of the proposed model was the highest, reaching 0.88,compared with traditional models(support vector machine(SVM) = 0.85, Bayesian network(BN) = 0.81,frequency ratio(FR) = 0.75, weight of evidence(WOE) = 0.76). The landslide frequency ratio and frequency density of the high susceptibility zones were 6.76/km^(2) and 0.88/km^(2), respectively, which were much higher than those of the low susceptibility zones. The top 20% interval of landslide occurrence probability contained 89% of the historical landslides but only accounted for 10.3% of the total area.Our results indicate that the distribution of high susceptibility zones was more focused without containing more " stable" pixels. Therefore, the obtained susceptibility map is suitable for application to landslide risk management practices.展开更多
This article presents two approaches for automated building of knowledge bases of soil resources mapping. These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from tra...This article presents two approaches for automated building of knowledge bases of soil resources mapping. These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from training data. With these methods, building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach. The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area, Zhejiang Province, China using TM bi-temporal imageries and GIS data. To evaluate the performance of the resultant knowledge bases, the classification results were compared to existing soil map based on field survey. The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.展开更多
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets.The tie occurs when there are equal proportions of the ...This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets.The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied.To solve the above mentioned exception,we propose to base the prediction of the result on the naive Bayes(NB)estimate,k-nearest neighbour(k-NN)and association rule mining(ARM).The other features used for splitting the parent nodes are also taken into consideration.展开更多
Decision trees induction algorithms have been used for classification in a wide range of application domains. In the process of constructing a tree, the criteria of selecting test attributes will influence the classif...Decision trees induction algorithms have been used for classification in a wide range of application domains. In the process of constructing a tree, the criteria of selecting test attributes will influence the classification accuracy of the tree.In this paper,the degree of dependency of decision attribute to condition attribute,based on rough set theory,is used as a heuristic for selecting the attribute that will best separate the samples into individual classes.The result of an example shows that compared with the entropy-based approach,our approach is a better way to select nodes for constructing decision trees.展开更多
In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy sampl...In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.展开更多
Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as ...Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.展开更多
A Fourier kernel based time-frequency transform is a proven candidate for non-stationary signal analysis and pattern recognition because of its ability to predict time localized spectrum and global phase reference cha...A Fourier kernel based time-frequency transform is a proven candidate for non-stationary signal analysis and pattern recognition because of its ability to predict time localized spectrum and global phase reference characteristics.However,it suffers from heavy computational overhead and large execution time.The paper,therefore,uses a novel fast discrete sparse S-transform(SST)suitable for extracting time frequency response to monitor non-stationary signal parameters,which can be ultimately used for disturbance detection,and their pattern classification.From the sparse S-transform matrix,some relevant features have been extracted which are used to distinguish among different non-stationary signals by a fuzzy decision tree based classifier.This algorithm is robust under noisy conditions.Various power quality as well as chirp signals have been simulated and tested with the proposed technique in noisy conditions as well.Some real time mechanical faulty signals have been collected to demonstrate the efficiency of the proposed algorithm.All the simulation results imply that the proposed technique is very much efficient.展开更多
针对地质建模时,人工识别山体内部岩石的局限性、低效率且易受主观因素影响等问题,提出了基于地震波反射信号的岩石类型自动识别技术。通过处理地震波反射信号获得岩石力学参数,采用Decision Tree ID3算法,提取岩石密度、波速、弹性模...针对地质建模时,人工识别山体内部岩石的局限性、低效率且易受主观因素影响等问题,提出了基于地震波反射信号的岩石类型自动识别技术。通过处理地震波反射信号获得岩石力学参数,采用Decision Tree ID3算法,提取岩石密度、波速、弹性模量、剪切模量,构建岩性识别模型分类器。通过该分类器对某山体内部岩石类型进行判断,研究结果证明:研究区内部多为辉长岩,玄武岩最少,通过模型分类结果与研究区真实地质对比分析,玄武岩正判率达到93%,安山岩、闪长岩正判率达到100%,花岗岩正判率达到88%,决策树建立的分类器模型能够基于地震波反射信号高效、准确地识别岩石岩性。展开更多
Snort rule-checking is one of the most popular forms of Network Intrusion Detection Systems (NIDS). In this article, we show that Snort priorities of true positive traffic (real attacks) can be approximated in real-ti...Snort rule-checking is one of the most popular forms of Network Intrusion Detection Systems (NIDS). In this article, we show that Snort priorities of true positive traffic (real attacks) can be approximated in real-time, in the context of high speed networks, by a decision tree classifier, using the information of only three easily extracted features (protocol, source port, and destination port), with an accuracy of 99%. Snort issues alert priorities based on its own default set of attack classes (34 classes) that are used by the default set of rules it provides. But the decision tree model is able to predict the priorities without using this default classification. The obtained tagger can provide a useful complement to an anomaly detection intrusion detection system.展开更多
The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes ...The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes classifier based on decision tree inductive learning algorithm ID3 for handwritten Chinese characters is presented. With a feature extraction controller, the classifier can reduce the number of extracted features and accelerate classification speed. Experimental results show that the 4-corner codes classifier performs well on both recognition accuracy and speed.展开更多
Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this pa...Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
Accurate classification of power quality disturbance is the premise and basis for improving and governing power quality. A method for power quality disturbance classification based on time-frequency domain multi-featu...Accurate classification of power quality disturbance is the premise and basis for improving and governing power quality. A method for power quality disturbance classification based on time-frequency domain multi-feature and decision tree is presented. Wavelet transform and S-transform are used to extract the feature quantity of each power quality disturbance signal, and a decision tree with classification rules is then constructed for classification and recognition based on the extracted feature quantity. The classification rules and decision tree classifier are established by combining the energy spectrum feature quantity extracted by wavelet transform and other seven time-frequency domain feature quantities extracted by S-transform. Simulation results show that the proposed method can effectively identify six types of common single disturbance signals and two mixed disturbance signals, with fast classification speed and adequate noise resistance. Its classification accuracy is also higher than those of support vector machine (SVM) and k-nearest neighbor (KNN) algorithms. Compared with the method that only uses S-transform, the proposed feature extraction method has more abundant features and higher classification accuracy for power quality disturbance.展开更多
Here,we demonstrate the application of Decision Tree Classification(DTC)method for lithological mapping from multi-spectral satellite imagery.The area of investigation is the Lake Magadi in the East African Rift Valle...Here,we demonstrate the application of Decision Tree Classification(DTC)method for lithological mapping from multi-spectral satellite imagery.The area of investigation is the Lake Magadi in the East African Rift Valley in Kenya.The work involves the collection of rock and soil samples in the field,their analyses using reflectance and emittance spectroscopy,and the processing and interpretation of Advanced Spaceborne Thermal Emission and Reflection Radiometer data through the DTC method.The latter method is strictly non-parametric,flexible and simple which does not require assumptions regarding the distributions of the input data.It has been successfully used in a wide range of classification problems.The DTC method successfully mapped the chert and trachyte series rocks,including clay minerals and evaporites of the area with higher overall accuracy(86%).Higher classification accuracies of the developed decision tree suggest its ability to adapt to noise and nonlinear relations often observed on the surface materials in space-borne spectral image data without making assumptions on the distribution of input data.Moreover,the present work found the DTC method useful in mapping lithological variations in the vast rugged terrain accurately,which are inherently equipped with different sources of noises even when subjected to considerable radiance and atmospheric correction.展开更多
文摘Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.
基金supported by the National Natural Science Foundation of China(60604021,60874054)
文摘To solve the multi-class fault diagnosis tasks,decision tree support vector machine(DTSVM),which combines SVM and decision tree using the concept of dichotomy,is proposed.Since the classification performance of DTSVM highly depends on its structure,to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes,genetic algorithm is introduced into the formation of decision tree,so that the most separable classes would be separated at each node of decisions tree.Numerical simulations conducted on three datasets compared with"one-against-all"and"one-against-one"demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.
文摘The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.
文摘This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.
文摘Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.
基金Supported by Science and Technology Plan of Mudanjiang City (G200920064)Teaching Reform Construction of Mudanjiang Normal University (10-xj11080)
文摘Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.
基金This research is funded by the National Natural Science Foundation of China(Grant Nos.41807285 and 51679117)Key Project of the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection(SKLGP2019Z002)+3 种基金the National Science Foundation of Jiangxi Province,China(20192BAB216034)the China Postdoctoral Science Foundation(2019M652287 and 2020T130274)the Jiangxi Provincial Postdoctoral Science Foundation(2019KY08)Fundamental Research Funds for National Universities,China University of Geosciences(Wuhan)。
文摘Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study presents a machine learning approach based on the C5.0 decision tree(DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map. Yanchang County, a typical landslide-prone area located in northwestern China, was taken as the area of interest to introduce the proposed application procedure. A landslide inventory containing 82 landslides was prepared and subsequently randomly partitioned into two subsets: training data(70% landslide pixels) and validation data(30% landslide pixels). Fourteen landslide influencing factors were considered in the input dataset and were used to calculate the landslide occurrence probability based on the C5.0 decision tree model.Susceptibility zonation was implemented according to the cut-off values calculated by the K-means cluster algorithm. The validation results of the model performance analysis showed that the AUC(area under the receiver operating characteristic(ROC) curve) of the proposed model was the highest, reaching 0.88,compared with traditional models(support vector machine(SVM) = 0.85, Bayesian network(BN) = 0.81,frequency ratio(FR) = 0.75, weight of evidence(WOE) = 0.76). The landslide frequency ratio and frequency density of the high susceptibility zones were 6.76/km^(2) and 0.88/km^(2), respectively, which were much higher than those of the low susceptibility zones. The top 20% interval of landslide occurrence probability contained 89% of the historical landslides but only accounted for 10.3% of the total area.Our results indicate that the distribution of high susceptibility zones was more focused without containing more " stable" pixels. Therefore, the obtained susceptibility map is suitable for application to landslide risk management practices.
基金Project supported by the National Natural Science Foundation ofChina (No. 40101014) and by the Science and technology Committee of Zhejiang Province (No. 001110445) China
文摘This article presents two approaches for automated building of knowledge bases of soil resources mapping. These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from training data. With these methods, building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach. The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area, Zhejiang Province, China using TM bi-temporal imageries and GIS data. To evaluate the performance of the resultant knowledge bases, the classification results were compared to existing soil map based on field survey. The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.
文摘This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets.The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied.To solve the above mentioned exception,we propose to base the prediction of the result on the naive Bayes(NB)estimate,k-nearest neighbour(k-NN)and association rule mining(ARM).The other features used for splitting the parent nodes are also taken into consideration.
文摘Decision trees induction algorithms have been used for classification in a wide range of application domains. In the process of constructing a tree, the criteria of selecting test attributes will influence the classification accuracy of the tree.In this paper,the degree of dependency of decision attribute to condition attribute,based on rough set theory,is used as a heuristic for selecting the attribute that will best separate the samples into individual classes.The result of an example shows that compared with the entropy-based approach,our approach is a better way to select nodes for constructing decision trees.
基金supported by National Natural Science Foundation of China under Grant 60703013 and 10978011Key Program of National Natural Science Foundation of China under Grant 60932008+1 种基金National Science Fund for Distinguished Young Scholars under Grant 50925625China Postdoctoral Science Foundation.
文摘In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.
文摘Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.
文摘A Fourier kernel based time-frequency transform is a proven candidate for non-stationary signal analysis and pattern recognition because of its ability to predict time localized spectrum and global phase reference characteristics.However,it suffers from heavy computational overhead and large execution time.The paper,therefore,uses a novel fast discrete sparse S-transform(SST)suitable for extracting time frequency response to monitor non-stationary signal parameters,which can be ultimately used for disturbance detection,and their pattern classification.From the sparse S-transform matrix,some relevant features have been extracted which are used to distinguish among different non-stationary signals by a fuzzy decision tree based classifier.This algorithm is robust under noisy conditions.Various power quality as well as chirp signals have been simulated and tested with the proposed technique in noisy conditions as well.Some real time mechanical faulty signals have been collected to demonstrate the efficiency of the proposed algorithm.All the simulation results imply that the proposed technique is very much efficient.
文摘针对地质建模时,人工识别山体内部岩石的局限性、低效率且易受主观因素影响等问题,提出了基于地震波反射信号的岩石类型自动识别技术。通过处理地震波反射信号获得岩石力学参数,采用Decision Tree ID3算法,提取岩石密度、波速、弹性模量、剪切模量,构建岩性识别模型分类器。通过该分类器对某山体内部岩石类型进行判断,研究结果证明:研究区内部多为辉长岩,玄武岩最少,通过模型分类结果与研究区真实地质对比分析,玄武岩正判率达到93%,安山岩、闪长岩正判率达到100%,花岗岩正判率达到88%,决策树建立的分类器模型能够基于地震波反射信号高效、准确地识别岩石岩性。
文摘Snort rule-checking is one of the most popular forms of Network Intrusion Detection Systems (NIDS). In this article, we show that Snort priorities of true positive traffic (real attacks) can be approximated in real-time, in the context of high speed networks, by a decision tree classifier, using the information of only three easily extracted features (protocol, source port, and destination port), with an accuracy of 99%. Snort issues alert priorities based on its own default set of attack classes (34 classes) that are used by the default set of rules it provides. But the decision tree model is able to predict the priorities without using this default classification. The obtained tagger can provide a useful complement to an anomaly detection intrusion detection system.
文摘The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes classifier based on decision tree inductive learning algorithm ID3 for handwritten Chinese characters is presented. With a feature extraction controller, the classifier can reduce the number of extracted features and accelerate classification speed. Experimental results show that the 4-corner codes classifier performs well on both recognition accuracy and speed.
基金supported by the National Natural Science Foundation of China (No. 60673024)the "Eleventh Five" Preliminary Research Project of PLA (No. 102060206)
文摘Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
基金supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2019JM-544).
文摘Accurate classification of power quality disturbance is the premise and basis for improving and governing power quality. A method for power quality disturbance classification based on time-frequency domain multi-feature and decision tree is presented. Wavelet transform and S-transform are used to extract the feature quantity of each power quality disturbance signal, and a decision tree with classification rules is then constructed for classification and recognition based on the extracted feature quantity. The classification rules and decision tree classifier are established by combining the energy spectrum feature quantity extracted by wavelet transform and other seven time-frequency domain feature quantities extracted by S-transform. Simulation results show that the proposed method can effectively identify six types of common single disturbance signals and two mixed disturbance signals, with fast classification speed and adequate noise resistance. Its classification accuracy is also higher than those of support vector machine (SVM) and k-nearest neighbor (KNN) algorithms. Compared with the method that only uses S-transform, the proposed feature extraction method has more abundant features and higher classification accuracy for power quality disturbance.
文摘Here,we demonstrate the application of Decision Tree Classification(DTC)method for lithological mapping from multi-spectral satellite imagery.The area of investigation is the Lake Magadi in the East African Rift Valley in Kenya.The work involves the collection of rock and soil samples in the field,their analyses using reflectance and emittance spectroscopy,and the processing and interpretation of Advanced Spaceborne Thermal Emission and Reflection Radiometer data through the DTC method.The latter method is strictly non-parametric,flexible and simple which does not require assumptions regarding the distributions of the input data.It has been successfully used in a wide range of classification problems.The DTC method successfully mapped the chert and trachyte series rocks,including clay minerals and evaporites of the area with higher overall accuracy(86%).Higher classification accuracies of the developed decision tree suggest its ability to adapt to noise and nonlinear relations often observed on the surface materials in space-borne spectral image data without making assumptions on the distribution of input data.Moreover,the present work found the DTC method useful in mapping lithological variations in the vast rugged terrain accurately,which are inherently equipped with different sources of noises even when subjected to considerable radiance and atmospheric correction.