System upgrades in unmanned systems have made Unmanned Aerial Vehicle(UAV)-based patrolling and monitoring a preferred solution for ocean surveillance.However,dynamic environments and large-scale deployments pose sign...System upgrades in unmanned systems have made Unmanned Aerial Vehicle(UAV)-based patrolling and monitoring a preferred solution for ocean surveillance.However,dynamic environments and large-scale deployments pose significant challenges for efficient decision-making,necessitating a modular multiagent control system.Deep Reinforcement Learning(DRL)and Decision Tree(DT)have been utilized for these complex decision-making tasks,but each has its limitations:DRL is highly adaptive but lacks interpretability,while DT is inherently interpretable but has limited adaptability.To overcome these challenges,we propose the Adaptive Interpretable Decision Tree(AIDT),an evolutionary-based algorithm that is both adaptable to diverse environmental settings and highly interpretable in its decision-making processes.We first construct a Markov decision process(MDP)-based simulation environment using the Cooperative Submarine Search task as a representative scenario for training and testing the proposed method.Specifically,we use the heat map as a state variable to address the issue of multi-agent input state proliferation.Next,we introduce the curiosity-guiding intrinsic reward to encourage comprehensive exploration and enhance algorithm performance.Additionally,we incorporate decision tree size as an influence factor in the adaptation process to balance task completion with computational efficiency.To further improve the generalization capability of the decision tree,we apply a normalization method to ensure consistent processing of input states.Finally,we validate the proposed algorithm in different environmental settings,and the results demonstrate both its adaptability and interpretability.展开更多
Database watermarking technologies provide an effective solution to data security problems by embedding the watermark in the database to prove copyright or trace the source of data leakage.However,when the watermarked...Database watermarking technologies provide an effective solution to data security problems by embedding the watermark in the database to prove copyright or trace the source of data leakage.However,when the watermarked database is used for data mining model building,such as decision trees,it may cause a different mining result in comparison with the result from the original database caused by the distortion of watermark embedding.Traditional watermarking algorithms mainly consider the statistical distortion of data,such as the mean square error,but very few consider the effect of the watermark on database mining.Therefore,in this paper,a consistency preserving database watermarking algorithm is proposed for decision trees.First,label classification statistics and label state transfer methods are proposed to adjust the watermarked data so that the model structure of the watermarked decision tree is the same as that of the original decision tree.Then,the splitting values of the decision tree are adjusted according to the defined constraint equations.Finally,the adjusted database can obtain a decision tree consistent with the original decision tree.The experimental results demonstrated that the proposed algorithm does not corrupt the watermarks,and makes the watermarked decision tree consistent with the original decision tree with a small distortion.展开更多
Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on ...Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on the structure and efficacy of random forests in mitigating overfitting—a prevalent issue in decision tree models.It also introduces a novel approach to enhancing decision tree performance through an optimized pruning method called Adaptive Cross-Validated Alpha CCP(ACV-CCP).This method refines traditional cost complexity pruning by streamlining the selection of the alpha parameter,leveraging cross-validation within the pruning process to achieve a reliable,computationally efficient alpha selection that generalizes well to unseen data.By enhancing computational efficiency and balancing model complexity,ACV-CCP allows decision trees to maintain predictive accuracy while minimizing overfitting,effectively narrowing the performance gap between decision trees and random forests.Our findings illustrate how ACV-CCP contributes to the robustness and applicability of decision trees,providing a valuable perspective on achieving computationally efficient and generalized machine learning models.展开更多
In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence ...In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.展开更多
AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with d...AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein, haptoglobin, α2 macroglobulin, and γ-glutamyl transpeptidase were used as predictors, and the FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of FO and F4 were classified with very high accuracy (18/20 for FO, 9/9 for FO-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in FO and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression,展开更多
This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN)...This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN). The soft decision tree networks were built using the PyTorch machine learning and the OpenAi’s Gym environment frameworks. The conducted research study aimed at assessing the performance of soft decision tree networks on Cartpole as provided in the OpenAi Gym software package. The baseline performance metric that the soft decision tree networks were compared against was a simple Deep Neural Network using several linear layers with ReLU and Softmax activation functions for the input and output layers, respectively. All networks were trained using the Backpropagation algorithm provided generically by PyTorch’sAutograd module.展开更多
Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can ...Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.展开更多
The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a...The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a decision tree.This article proposes a new approach based on an improved ID3 algorithm.The new algorithm introduces the importance factor λ when calculating the information entropy.It can strengthen the label of important attributes of a tree and reduce the label of non-important attributes.The algorithm overcomes the flaw of the traditional ID3 algorithm which tends to choose the attributes with more values,and also improves the efficiency and flexibility in the process of generating decision trees.展开更多
In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy sampl...In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.展开更多
We consider various tasks of recognizing properties of DRSs(Decision Rule Systems)in this paper.As solution algorithms,DDTs(Deterministic Decision Trees)and NDTs(Nondeterministic Decision Trees)are used.An NDT can be ...We consider various tasks of recognizing properties of DRSs(Decision Rule Systems)in this paper.As solution algorithms,DDTs(Deterministic Decision Trees)and NDTs(Nondeterministic Decision Trees)are used.An NDT can be considered as a representation of a DRS that satisfies the conditions of the considered task and covers all potential inputs.It has been shown that the minimum depth of a DDT solving the task does not exceed the square of the minimum depth of an NDT.The growth of the minimum number of nodes in DDTs and NDTs can be exponential with the size of the original DRSs.There-fore,in the general case,it is better to simulate the behavior of the DT(Deci-sion Tree)on the given tuple of feature values rather than building the entire tree.We propose a greedy algorithm for such modeling and study its efficiency for a class of tasks of recognizing properties of DRSs.The obtained results may be of interest for data analysis in which both DRSs and DTs are intensively studied.In particular,these results make one think about the possibilities of transforming DRSs into DTs.展开更多
A binary complete decision table with many-valued decisions is a table with n attributes and 2^(n) pairwise distinct rows filled with numbers from the set{0,1}.Each row of this table is labeled with a nonempty finite ...A binary complete decision table with many-valued decisions is a table with n attributes and 2^(n) pairwise distinct rows filled with numbers from the set{0,1}.Each row of this table is labeled with a nonempty finite set of decisions.For a given row of the table,the task is to find a decision from the set of decisions attached to the row.Such tables are generalizations of Boolean functions.They can also be viewed as representations of various problems related to systems of decision rules.In this paper,we consider three types of classes of binary complete decision tables with many-valued decisions,closed with respect to removal of columns and changing of decisions.For tables from these classes,we study the relationships between the minimum weighted depth of deterministic,nondeterministic,and(for one type of classes)strongly nondeterministic decision trees and the total weight of attributes attached to columns.Note that nondeterministic decision trees and strongly nondeterministic decision trees for decision tables can be interpreted as a way of representing the two types of systems of decision rules for these tables.展开更多
Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs...Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.展开更多
To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on ...To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.展开更多
Mangroves are woody plant communities in the intertidal zone of tropical and subtropical coasts that play an important role in these zones. The infrared wave band is one of the key bands in the remote sensing identifi...Mangroves are woody plant communities in the intertidal zone of tropical and subtropical coasts that play an important role in these zones. The infrared wave band is one of the key bands in the remote sensing identification of mangrove forest, and ALI(advanced land imagery) has a large number of infrared bands. Two angle indices were proposed based on liquid water absorption at band 5p and band 5 of EO-1 ALI, denoted as β1.25 and β1.65 respectively. A decision tree method was adopted to identify mangrove forest using remote sensing techniques for β1.25–β1.65 and NDVI(normalized difference vegetation index) for EO-1 ALI imagery acquired at Shenzhen Bay. The results showed that the reflectance of mangrove forests at band 5p and band 5 was significantly lower than that of terrestrial vegetation due to the characteristics of coastal wetlands of mangrove forests. This resulted in a greater β1.25–β1.65 value for mangrove forest than terrestrial vegetation. The decision tree method using β1.25–β1.65 and NDVI effectively identifies mangrove forest from other land cover categories. The misclassification and leakage rates were 4.29% and 5.11% respectively. ALI sensors with many infrared bands could play an important role in discriminating mangrove forest.展开更多
Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple...Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple linear regression)are not very efficient.However,in chemometrics these methods are still not very widespread,first of all because of several limitations related to the ratio between number of variables and observations.This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification.We will try to consider all important aspects including optimization and validation of models,evaluation of results,treating missing data and selection of most important variables.The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.展开更多
Power systems transport an increasing amount of electricity,and in the future,involve more distributed renewables and dynamic interactions of the equipment.The system response to disturbances must be secure and predic...Power systems transport an increasing amount of electricity,and in the future,involve more distributed renewables and dynamic interactions of the equipment.The system response to disturbances must be secure and predictable to avoid power blackouts.The system response can be simulated in the time domain.However,this dynamic security assessment(DSA)is not computationally tractable in real-time.Particularly promising is to train decision trees(DTs)from machine learning as interpretable classifiers to predict whether the systemwide responses to disturbances are secure.In most research,selecting the best DT model focuses on predictive accuracy.However,it is insufficient to focus solely on predictive accuracy.Missed alarms and false alarms have drastically different costs,and as security assessment is a critical task,interpretability is crucial for operators.In this work,the multiple objectives of interpretability,varying costs,and accuracies are considered for DT model selection.We propose a rigorous workflow to select the best classifier.In addition,we present two graphical approaches for visual inspection to illustrate the selection sensitivity to probability and impacts of disturbances.We propose cost curves to inspect selection combining all three objectives for the first time.Case studies on the IEEE 68 bus system and the French system show that the proposed approach allows for better DT-selections,with an 80%increase in interpretability,5%reduction in expected operating cost,while making almost zero accuracy compromises.The proposed approach scales well with larger systems and can be used for models beyond DTs.Hence,this work provides insights into criteria for model selection in a promising application for methods from artificial intelligence(AI).展开更多
Decision trees can be used to enhance the interpretability of neural networks.In this work,we compare the classification and interpretability performance of the normal decision tree and a type of soft decision tree wh...Decision trees can be used to enhance the interpretability of neural networks.In this work,we compare the classification and interpretability performance of the normal decision tree and a type of soft decision tree when they are used to interpret the decision paths of CNN networks.With the help of feature visualization and human-labeled features,we demonstrate that the soft decision trees identify more consistent features while maintaining much higher classification performance than the normal decision tree.展开更多
Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable. This research aims to explore the process of constructing common predictive models, Logistic reg...Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable. This research aims to explore the process of constructing common predictive models, Logistic regression (LR), decision tree (DT) and multilayer perceptron (MLP), as well as focus on specific details when applying the methods mentioned above: what preconditions should be satisfied, how to set parameters of the model, how to screen variables and build accuracy models quickly and efficiently, and how to assess the generalization ability (that is, prediction performance) reliably by Monte Carlo method in the case of small sample size.展开更多
Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this pa...Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.展开更多
文摘System upgrades in unmanned systems have made Unmanned Aerial Vehicle(UAV)-based patrolling and monitoring a preferred solution for ocean surveillance.However,dynamic environments and large-scale deployments pose significant challenges for efficient decision-making,necessitating a modular multiagent control system.Deep Reinforcement Learning(DRL)and Decision Tree(DT)have been utilized for these complex decision-making tasks,but each has its limitations:DRL is highly adaptive but lacks interpretability,while DT is inherently interpretable but has limited adaptability.To overcome these challenges,we propose the Adaptive Interpretable Decision Tree(AIDT),an evolutionary-based algorithm that is both adaptable to diverse environmental settings and highly interpretable in its decision-making processes.We first construct a Markov decision process(MDP)-based simulation environment using the Cooperative Submarine Search task as a representative scenario for training and testing the proposed method.Specifically,we use the heat map as a state variable to address the issue of multi-agent input state proliferation.Next,we introduce the curiosity-guiding intrinsic reward to encourage comprehensive exploration and enhance algorithm performance.Additionally,we incorporate decision tree size as an influence factor in the adaptation process to balance task completion with computational efficiency.To further improve the generalization capability of the decision tree,we apply a normalization method to ensure consistent processing of input states.Finally,we validate the proposed algorithm in different environmental settings,and the results demonstrate both its adaptability and interpretability.
基金supported by the National Key Research and Development Program of China under Grant 2021YFB2700600the National Natural Science Foundation of China under Grant 62132013 and 61902292+1 种基金the Key Research and Development Programs of Shaanxi under Grants 2021ZDLGY06-03the Truth-Seeking Research Scholarship Fund of Xidian University。
文摘Database watermarking technologies provide an effective solution to data security problems by embedding the watermark in the database to prove copyright or trace the source of data leakage.However,when the watermarked database is used for data mining model building,such as decision trees,it may cause a different mining result in comparison with the result from the original database caused by the distortion of watermark embedding.Traditional watermarking algorithms mainly consider the statistical distortion of data,such as the mean square error,but very few consider the effect of the watermark on database mining.Therefore,in this paper,a consistency preserving database watermarking algorithm is proposed for decision trees.First,label classification statistics and label state transfer methods are proposed to adjust the watermarked data so that the model structure of the watermarked decision tree is the same as that of the original decision tree.Then,the splitting values of the decision tree are adjusted according to the defined constraint equations.Finally,the adjusted database can obtain a decision tree consistent with the original decision tree.The experimental results demonstrated that the proposed algorithm does not corrupt the watermarks,and makes the watermarked decision tree consistent with the original decision tree with a small distortion.
文摘Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on the structure and efficacy of random forests in mitigating overfitting—a prevalent issue in decision tree models.It also introduces a novel approach to enhancing decision tree performance through an optimized pruning method called Adaptive Cross-Validated Alpha CCP(ACV-CCP).This method refines traditional cost complexity pruning by streamlining the selection of the alpha parameter,leveraging cross-validation within the pruning process to achieve a reliable,computationally efficient alpha selection that generalizes well to unseen data.By enhancing computational efficiency and balancing model complexity,ACV-CCP allows decision trees to maintain predictive accuracy while minimizing overfitting,effectively narrowing the performance gap between decision trees and random forests.Our findings illustrate how ACV-CCP contributes to the robustness and applicability of decision trees,providing a valuable perspective on achieving computationally efficient and generalized machine learning models.
文摘In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.
基金Supported by A grant of the Universidad Nacional Autonoma de Mexico SDI.PTID.05.6
文摘AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein, haptoglobin, α2 macroglobulin, and γ-glutamyl transpeptidase were used as predictors, and the FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of FO and F4 were classified with very high accuracy (18/20 for FO, 9/9 for FO-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in FO and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression,
文摘This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN). The soft decision tree networks were built using the PyTorch machine learning and the OpenAi’s Gym environment frameworks. The conducted research study aimed at assessing the performance of soft decision tree networks on Cartpole as provided in the OpenAi Gym software package. The baseline performance metric that the soft decision tree networks were compared against was a simple Deep Neural Network using several linear layers with ReLU and Softmax activation functions for the input and output layers, respectively. All networks were trained using the Backpropagation algorithm provided generically by PyTorch’sAutograd module.
文摘Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.
文摘The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a decision tree.This article proposes a new approach based on an improved ID3 algorithm.The new algorithm introduces the importance factor λ when calculating the information entropy.It can strengthen the label of important attributes of a tree and reduce the label of non-important attributes.The algorithm overcomes the flaw of the traditional ID3 algorithm which tends to choose the attributes with more values,and also improves the efficiency and flexibility in the process of generating decision trees.
基金supported by National Natural Science Foundation of China under Grant 60703013 and 10978011Key Program of National Natural Science Foundation of China under Grant 60932008+1 种基金National Science Fund for Distinguished Young Scholars under Grant 50925625China Postdoctoral Science Foundation.
文摘In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.
基金supported by King Abdullah University of Science and Technology(KAUST).
文摘We consider various tasks of recognizing properties of DRSs(Decision Rule Systems)in this paper.As solution algorithms,DDTs(Deterministic Decision Trees)and NDTs(Nondeterministic Decision Trees)are used.An NDT can be considered as a representation of a DRS that satisfies the conditions of the considered task and covers all potential inputs.It has been shown that the minimum depth of a DDT solving the task does not exceed the square of the minimum depth of an NDT.The growth of the minimum number of nodes in DDTs and NDTs can be exponential with the size of the original DRSs.There-fore,in the general case,it is better to simulate the behavior of the DT(Deci-sion Tree)on the given tuple of feature values rather than building the entire tree.We propose a greedy algorithm for such modeling and study its efficiency for a class of tasks of recognizing properties of DRSs.The obtained results may be of interest for data analysis in which both DRSs and DTs are intensively studied.In particular,these results make one think about the possibilities of transforming DRSs into DTs.
基金supported by King Abdullah University of Science and Technology(KAUST).
文摘A binary complete decision table with many-valued decisions is a table with n attributes and 2^(n) pairwise distinct rows filled with numbers from the set{0,1}.Each row of this table is labeled with a nonempty finite set of decisions.For a given row of the table,the task is to find a decision from the set of decisions attached to the row.Such tables are generalizations of Boolean functions.They can also be viewed as representations of various problems related to systems of decision rules.In this paper,we consider three types of classes of binary complete decision tables with many-valued decisions,closed with respect to removal of columns and changing of decisions.For tables from these classes,we study the relationships between the minimum weighted depth of deterministic,nondeterministic,and(for one type of classes)strongly nondeterministic decision trees and the total weight of attributes attached to columns.Note that nondeterministic decision trees and strongly nondeterministic decision trees for decision tables can be interpreted as a way of representing the two types of systems of decision rules for these tables.
文摘Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.
基金supported by the Major Projects for Science and Technology Innovation 2030(2018AAA0100805).
文摘To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.
基金National Natural Science Foundation of China(41201461)
文摘Mangroves are woody plant communities in the intertidal zone of tropical and subtropical coasts that play an important role in these zones. The infrared wave band is one of the key bands in the remote sensing identification of mangrove forest, and ALI(advanced land imagery) has a large number of infrared bands. Two angle indices were proposed based on liquid water absorption at band 5p and band 5 of EO-1 ALI, denoted as β1.25 and β1.65 respectively. A decision tree method was adopted to identify mangrove forest using remote sensing techniques for β1.25–β1.65 and NDVI(normalized difference vegetation index) for EO-1 ALI imagery acquired at Shenzhen Bay. The results showed that the reflectance of mangrove forests at band 5p and band 5 was significantly lower than that of terrestrial vegetation due to the characteristics of coastal wetlands of mangrove forests. This resulted in a greater β1.25–β1.65 value for mangrove forest than terrestrial vegetation. The decision tree method using β1.25–β1.65 and NDVI effectively identifies mangrove forest from other land cover categories. The misclassification and leakage rates were 4.29% and 5.11% respectively. ALI sensors with many infrared bands could play an important role in discriminating mangrove forest.
文摘Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple linear regression)are not very efficient.However,in chemometrics these methods are still not very widespread,first of all because of several limitations related to the ratio between number of variables and observations.This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification.We will try to consider all important aspects including optimization and validation of models,evaluation of results,treating missing data and selection of most important variables.The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.
基金The authors were supported by a scholarship funded by the Nige-rian National Petroleum Corporation,NNPC,the TU Delft AI Labs Programme,NL,and the research project IDLES,UK(EP/R045518/1).
文摘Power systems transport an increasing amount of electricity,and in the future,involve more distributed renewables and dynamic interactions of the equipment.The system response to disturbances must be secure and predictable to avoid power blackouts.The system response can be simulated in the time domain.However,this dynamic security assessment(DSA)is not computationally tractable in real-time.Particularly promising is to train decision trees(DTs)from machine learning as interpretable classifiers to predict whether the systemwide responses to disturbances are secure.In most research,selecting the best DT model focuses on predictive accuracy.However,it is insufficient to focus solely on predictive accuracy.Missed alarms and false alarms have drastically different costs,and as security assessment is a critical task,interpretability is crucial for operators.In this work,the multiple objectives of interpretability,varying costs,and accuracies are considered for DT model selection.We propose a rigorous workflow to select the best classifier.In addition,we present two graphical approaches for visual inspection to illustrate the selection sensitivity to probability and impacts of disturbances.We propose cost curves to inspect selection combining all three objectives for the first time.Case studies on the IEEE 68 bus system and the French system show that the proposed approach allows for better DT-selections,with an 80%increase in interpretability,5%reduction in expected operating cost,while making almost zero accuracy compromises.The proposed approach scales well with larger systems and can be used for models beyond DTs.Hence,this work provides insights into criteria for model selection in a promising application for methods from artificial intelligence(AI).
基金National Defense Science and Technology Innovation Special Zone Project (No.18-163-11-ZT-002-045-04)Engineering Research Center of State Financial Security,Ministry of Education,Central University of Finance and Economics,Beijing,102206,China+1 种基金Program for Innovation Research inCentral University of Finance and EconomicsNational College Students’Innovation and Entrepreneurship Training Program“Research and development of interpretable algorithms and prototype system for small sample image recognition”.
文摘Decision trees can be used to enhance the interpretability of neural networks.In this work,we compare the classification and interpretability performance of the normal decision tree and a type of soft decision tree when they are used to interpret the decision paths of CNN networks.With the help of feature visualization and human-labeled features,we demonstrate that the soft decision trees identify more consistent features while maintaining much higher classification performance than the normal decision tree.
基金This work was supported by the grants from National Natural Science Foundation of China (No. 21003077), College of Public Health of Tianjin Medical University in China (No. GWKY-2010-01), the Open Project of Key Laboratory of Advanced Energy Materials Chemistry (No. KLAEMC- OP201101) and Natural Science Foundation of Tianjin China (No. 08JCZDJC21400).
文摘Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable. This research aims to explore the process of constructing common predictive models, Logistic regression (LR), decision tree (DT) and multilayer perceptron (MLP), as well as focus on specific details when applying the methods mentioned above: what preconditions should be satisfied, how to set parameters of the model, how to screen variables and build accuracy models quickly and efficiently, and how to assess the generalization ability (that is, prediction performance) reliably by Monte Carlo method in the case of small sample size.
基金supported by the National Natural Science Foundation of China (No. 60673024)the "Eleventh Five" Preliminary Research Project of PLA (No. 102060206)
文摘Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.