The presence of numerous uncertainties in hybrid decision information systems(HDISs)renders attribute reduction a formidable task.Currently available attribute reduction algorithms,including those based on Pawlak attr...The presence of numerous uncertainties in hybrid decision information systems(HDISs)renders attribute reduction a formidable task.Currently available attribute reduction algorithms,including those based on Pawlak attribute importance,Skowron discernibility matrix,and information entropy,struggle to effectively manages multiple uncertainties simultaneously in HDISs like the precise measurement of disparities between nominal attribute values,and attributes with fuzzy boundaries and abnormal values.In order to address the aforementioned issues,this paper delves into the study of attribute reduction withinHDISs.First of all,a novel metric based on the decision attribute is introduced to solve the problem of accurately measuring the differences between nominal attribute values.The newly introduced distance metric has been christened the supervised distance that can effectively quantify the differences between the nominal attribute values.Then,based on the newly developed metric,a novel fuzzy relationship is defined from the perspective of“feedback on parity of attribute values to attribute sets”.This new fuzzy relationship serves as a valuable tool in addressing the challenges posed by abnormal attribute values.Furthermore,leveraging the newly introduced fuzzy relationship,the fuzzy conditional information entropy is defined as a solution to the challenges posed by fuzzy attributes.It effectively quantifies the uncertainty associated with fuzzy attribute values,thereby providing a robust framework for handling fuzzy information in hybrid information systems.Finally,an algorithm for attribute reduction utilizing the fuzzy conditional information entropy is presented.The experimental results on 12 datasets show that the average reduction rate of our algorithm reaches 84.04%,and the classification accuracy is improved by 3.91%compared to the original dataset,and by an average of 11.25%compared to the other 9 state-of-the-art reduction algorithms.The comprehensive analysis of these research results clearly indicates that our algorithm is highly effective in managing the intricate uncertainties inherent in hybrid data.展开更多
Current high-dimensional feature screening methods still face significant challenges in handling mixed linear and nonlinear relationships,controlling redundant information,and improving model robustness.In this study,...Current high-dimensional feature screening methods still face significant challenges in handling mixed linear and nonlinear relationships,controlling redundant information,and improving model robustness.In this study,we propose a Dynamic Conditional Feature Screening(DCFS)method tailored for high-dimensional economic forecasting tasks.Our goal is to accurately identify key variables,enhance predictive performance,and provide both theoretical foundations and practical tools for macroeconomic modeling.The DCFS method constructs a comprehensive test statistic by integrating conditional mutual information with conditional regression error differences.By introducing a dynamic weighting mechanism,DCFS adaptively balances the linear and nonlinear contributions of features during the screening process.In addition,a dynamic thresholding mechanism is designed to effectively control the false discovery rate(FDR),thereby improving the stability and reliability of the screening results.On the theoretical front,we rigorously prove that the proposed method satisfies the sure screening property and rank consistency,ensuring accurate identification of the truly important feature set in high-dimensional settings.Simulation results demonstrate that under purely linear,purely nonlinear,and mixed dependency structures,DCFS consistently outperforms classical screening methods such as SIS,CSIS,and IG-SIS in terms of true positive rate(TPR),false discovery rate(FDR),and rank correlation.These results highlight the superior accuracy,robustness,and stability of our method.Furthermore,an empirical analysis based on the U.S.FRED-MD macroeconomic dataset confirms the practical value of DCFS in real-world forecasting tasks.The experimental results show that DCFS achieves lower prediction errors(RMSE and MAE)and higher R2 values in forecasting GDP growth.The selected key variables-including the Industrial Production Index(IP),Federal Funds Rate,Consumer Price Index(CPI),and Money Supply(M2)-possess clear economic interpretability,offering reliable support for economic forecasting and policy formulation.展开更多
The work proposes a distributed Kalman filtering(KF)algorithm to track a time-varying unknown signal process for a stochastic regression model over network systems in a cooperative way.We provide the stability analysi...The work proposes a distributed Kalman filtering(KF)algorithm to track a time-varying unknown signal process for a stochastic regression model over network systems in a cooperative way.We provide the stability analysis of the proposed distributed KF algorithm without independent and stationary signal assumptions,which implies that the theoretical results are able to be applied to stochastic feedback systems.Note that the main difficulty of stability analysis lies in analyzing the properties of the product of non-independent and non-stationary random matrices involved in the error equation.We employ analysis techniques such as stochastic Lyapunov function,stability theory of stochastic systems,and algebraic graph theory to deal with the above issue.The stochastic spatio-temporal cooperative information condition shows the cooperative property of multiple sensors that even though any local sensor cannot track the time-varying unknown signal,the distributed KF algorithm can be utilized to finish the filtering task in a cooperative way.At last,we illustrate the property of the proposed distributed KF algorithm by a simulation example.展开更多
The heart rate variability could be explained by a low-dimensional governing mechanism. There has been increasing interest in verifying and understanding the coupling between the respiration and the heart rate. In thi...The heart rate variability could be explained by a low-dimensional governing mechanism. There has been increasing interest in verifying and understanding the coupling between the respiration and the heart rate. In this paper we use the nonlinear detection method to detect the nonlinear deterministic component in the physiological time series by a single variable series and two variables series respectively, and use the conditional information entropy to analyze the correlation between the heart rate, the respiration and the blood oxygen concentration. The conclusions are that there is the nonlinear deterministic component in the heart rate data and respiration data, and the heart rate and the respiration are two variables originating from the same underlying dynamics.展开更多
Numerous models have been proposed to reduce the classification error of Naive Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensembl...Numerous models have been proposed to reduce the classification error of Naive Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of reducing the classifmation error of the classifier, this paper proposes a double-layer Bayesian classifier ensembles (DLBCE) algorithm based on frequent itemsets. DLBCE constructs a double-layer Bayesian classifier (DLBC) for each frequent itemset the new instance contained and finally ensembles all the classifiers by assigning different weight to different classifier according to the conditional mutual information. The experimental results show that the proposed algorithm outperforms other outstanding algorithms.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual inform...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.展开更多
The information content of rules is categorized into inner mutual information content and outer impartation information content. Actually, the conventional objective interestingness measures based on information theor...The information content of rules is categorized into inner mutual information content and outer impartation information content. Actually, the conventional objective interestingness measures based on information theory are all inner mutual information, which represent the confidence of rules and the mutual information between the antecedent and consequent. Moreover, almost all of these measures lose sight of the outer impartation information, which is conveyed to the user and help the user to make decisions. We put forward the viewpoint that the outer impartation information content of rules and rule sets can be represented by the relations from input universe to output universe. By binary relations, the interaction of rules in a rule set can be easily represented by operators: union and intersection. Based on the entropy of relations, the outer impartation information content of rules and rule sets are well measured. Then, the conditional information content of rules and rule sets, the independence of rules and rule sets and the inconsistent knowledge of rule sets are defined and measured. The properties of these new measures are discussed and some interesting results are proven, such as the information content of a rule set may be bigger than the sum of the information content of rules in the rule set, and the conditional information content of rules may be negative. At last, the applications of these new measures are discussed. The new method for the appraisement of rule mining algorithm, and two rule pruning algorithms, λ-choice and RPClC, are put forward. These new methods and algorithms have predominance in satisfying the need of more efficient decision information.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linea...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise. Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series, it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations. This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models, which extends the additive nonlinear times series to nonlinear structural vector autoregressive models. An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables. Simulations demonstrate the effectiveness of the nroosed method.展开更多
The general mutual information (GMI) and general conditional mutual information (GCMI) are considered to measure lag dependences in nonlinear time series. Both of the measures have the property of invariance with ...The general mutual information (GMI) and general conditional mutual information (GCMI) are considered to measure lag dependences in nonlinear time series. Both of the measures have the property of invariance with transform. The statistics based on GMI and GCMI are estimated using the correlation integral. Under the hypothesis of independent series, the estimators have Gaussian asymptotic distributions. Simulations applied to generated nonlinear series demonstrate that the methods appear to find frequently the correct lags.展开更多
Correctly quantifying the direct association between variables based on observed data is a valuable topic to study.On the one hand,many traditional methods can only measure the linear direct association.On the other h...Correctly quantifying the direct association between variables based on observed data is a valuable topic to study.On the one hand,many traditional methods can only measure the linear direct association.On the other hand,certain existing measures of direct association between two variables suffer an instability problem when a parent variable has a strong influence on both variables.To solve these issues,we propose a measure,namely the independent conditional mutual information(ICMI),to quantify the direct association between two variables in a three-variable network.Additionally,we use simulation data to numerically compare the stability and reliability of the ICMI with those of other measures of direct association under different conditions.The numerical results show that ICMI performs more stably in many cases than the known measures such as unique information,conditional mutual information,and partial correlation.The statistical power results show that ICMI is more reliable for different forms of function.We further use our measure to analyze a network consisting of family finance,social security,and the residence of senior citizens.展开更多
An information theory method is proposed to test the. Granger causality and contemporaneous conditional independence in Granger causality graph models. In the graphs, the vertex set denotes the component series of the...An information theory method is proposed to test the. Granger causality and contemporaneous conditional independence in Granger causality graph models. In the graphs, the vertex set denotes the component series of the multivariate time series, and the directed edges denote causal dependence, while the undirected edges reflect the instantaneous dependence. The presence of the edges is measured by a statistics based on conditional mutual information and tested by a permutation procedure. Furthermore, for the existed relations, a statistics based on the difference between general conditional mutual information and linear conditional mutual information is proposed to test the nonlinearity. The significance of the nonlinear test statistics is determined by a bootstrap method based on surrogate data. We investigate the finite sample behavior of the procedure through simulation time series with different dependence structures, including linear and nonlinear relations.展开更多
基金Anhui Province Natural Science Research Project of Colleges and Universities(2023AH040321)Excellent Scientific Research and Innovation Team of Anhui Colleges(2022AH010098).
文摘The presence of numerous uncertainties in hybrid decision information systems(HDISs)renders attribute reduction a formidable task.Currently available attribute reduction algorithms,including those based on Pawlak attribute importance,Skowron discernibility matrix,and information entropy,struggle to effectively manages multiple uncertainties simultaneously in HDISs like the precise measurement of disparities between nominal attribute values,and attributes with fuzzy boundaries and abnormal values.In order to address the aforementioned issues,this paper delves into the study of attribute reduction withinHDISs.First of all,a novel metric based on the decision attribute is introduced to solve the problem of accurately measuring the differences between nominal attribute values.The newly introduced distance metric has been christened the supervised distance that can effectively quantify the differences between the nominal attribute values.Then,based on the newly developed metric,a novel fuzzy relationship is defined from the perspective of“feedback on parity of attribute values to attribute sets”.This new fuzzy relationship serves as a valuable tool in addressing the challenges posed by abnormal attribute values.Furthermore,leveraging the newly introduced fuzzy relationship,the fuzzy conditional information entropy is defined as a solution to the challenges posed by fuzzy attributes.It effectively quantifies the uncertainty associated with fuzzy attribute values,thereby providing a robust framework for handling fuzzy information in hybrid information systems.Finally,an algorithm for attribute reduction utilizing the fuzzy conditional information entropy is presented.The experimental results on 12 datasets show that the average reduction rate of our algorithm reaches 84.04%,and the classification accuracy is improved by 3.91%compared to the original dataset,and by an average of 11.25%compared to the other 9 state-of-the-art reduction algorithms.The comprehensive analysis of these research results clearly indicates that our algorithm is highly effective in managing the intricate uncertainties inherent in hybrid data.
文摘Current high-dimensional feature screening methods still face significant challenges in handling mixed linear and nonlinear relationships,controlling redundant information,and improving model robustness.In this study,we propose a Dynamic Conditional Feature Screening(DCFS)method tailored for high-dimensional economic forecasting tasks.Our goal is to accurately identify key variables,enhance predictive performance,and provide both theoretical foundations and practical tools for macroeconomic modeling.The DCFS method constructs a comprehensive test statistic by integrating conditional mutual information with conditional regression error differences.By introducing a dynamic weighting mechanism,DCFS adaptively balances the linear and nonlinear contributions of features during the screening process.In addition,a dynamic thresholding mechanism is designed to effectively control the false discovery rate(FDR),thereby improving the stability and reliability of the screening results.On the theoretical front,we rigorously prove that the proposed method satisfies the sure screening property and rank consistency,ensuring accurate identification of the truly important feature set in high-dimensional settings.Simulation results demonstrate that under purely linear,purely nonlinear,and mixed dependency structures,DCFS consistently outperforms classical screening methods such as SIS,CSIS,and IG-SIS in terms of true positive rate(TPR),false discovery rate(FDR),and rank correlation.These results highlight the superior accuracy,robustness,and stability of our method.Furthermore,an empirical analysis based on the U.S.FRED-MD macroeconomic dataset confirms the practical value of DCFS in real-world forecasting tasks.The experimental results show that DCFS achieves lower prediction errors(RMSE and MAE)and higher R2 values in forecasting GDP growth.The selected key variables-including the Industrial Production Index(IP),Federal Funds Rate,Consumer Price Index(CPI),and Money Supply(M2)-possess clear economic interpretability,offering reliable support for economic forecasting and policy formulation.
基金supported in part by Sichuan Science and Technology Program under Grant No.2025ZNSFSC151in part by the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No.XDA27030201+1 种基金the Natural Science Foundation of China under Grant No.U21B6001in part by the Natural Science Foundation of Tianjin under Grant No.24JCQNJC01930.
文摘The work proposes a distributed Kalman filtering(KF)algorithm to track a time-varying unknown signal process for a stochastic regression model over network systems in a cooperative way.We provide the stability analysis of the proposed distributed KF algorithm without independent and stationary signal assumptions,which implies that the theoretical results are able to be applied to stochastic feedback systems.Note that the main difficulty of stability analysis lies in analyzing the properties of the product of non-independent and non-stationary random matrices involved in the error equation.We employ analysis techniques such as stochastic Lyapunov function,stability theory of stochastic systems,and algebraic graph theory to deal with the above issue.The stochastic spatio-temporal cooperative information condition shows the cooperative property of multiple sensors that even though any local sensor cannot track the time-varying unknown signal,the distributed KF algorithm can be utilized to finish the filtering task in a cooperative way.At last,we illustrate the property of the proposed distributed KF algorithm by a simulation example.
基金Scientific Research Foundation for the Returned Overseas Chinese Scholars of ChinaGrant number:20041764+1 种基金Natural Science Foundation of Shandong ProvinceGrant number:Z2004G01
文摘The heart rate variability could be explained by a low-dimensional governing mechanism. There has been increasing interest in verifying and understanding the coupling between the respiration and the heart rate. In this paper we use the nonlinear detection method to detect the nonlinear deterministic component in the physiological time series by a single variable series and two variables series respectively, and use the conditional information entropy to analyze the correlation between the heart rate, the respiration and the blood oxygen concentration. The conclusions are that there is the nonlinear deterministic component in the heart rate data and respiration data, and the heart rate and the respiration are two variables originating from the same underlying dynamics.
基金supported by National Natural Science Foundation of China (Nos. 61073133, 60973067, and 61175053)Fundamental Research Funds for the Central Universities of China(No. 2011ZD010)
文摘Numerous models have been proposed to reduce the classification error of Naive Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of reducing the classifmation error of the classifier, this paper proposes a double-layer Bayesian classifier ensembles (DLBCE) algorithm based on frequent itemsets. DLBCE constructs a double-layer Bayesian classifier (DLBC) for each frequent itemset the new instance contained and finally ensembles all the classifiers by assigning different weight to different classifier according to the conditional mutual information. The experimental results show that the proposed algorithm outperforms other outstanding algorithms.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150, 10926197,61201323
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.
基金the National Natural Science Foundation of China (Grant Nos. 60774049 and 40672195)Natural Science Foundation of Beijing (Grant No. 4062020)+1 种基金National 973 Fundamental Research Project of China (Grant No. 2002CB312200)the Youth Foundation of Beijing Normal University
文摘The information content of rules is categorized into inner mutual information content and outer impartation information content. Actually, the conventional objective interestingness measures based on information theory are all inner mutual information, which represent the confidence of rules and the mutual information between the antecedent and consequent. Moreover, almost all of these measures lose sight of the outer impartation information, which is conveyed to the user and help the user to make decisions. We put forward the viewpoint that the outer impartation information content of rules and rule sets can be represented by the relations from input universe to output universe. By binary relations, the interaction of rules in a rule set can be easily represented by operators: union and intersection. Based on the entropy of relations, the outer impartation information content of rules and rule sets are well measured. Then, the conditional information content of rules and rule sets, the independence of rules and rule sets and the inconsistent knowledge of rule sets are defined and measured. The properties of these new measures are discussed and some interesting results are proven, such as the information content of a rule set may be bigger than the sum of the information content of rules in the rule set, and the conditional information content of rules may be negative. At last, the applications of these new measures are discussed. The new method for the appraisement of rule mining algorithm, and two rule pruning algorithms, λ-choice and RPClC, are put forward. These new methods and algorithms have predominance in satisfying the need of more efficient decision information.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150 and 10926197
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise. Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series, it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations. This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models, which extends the additive nonlinear times series to nonlinear structural vector autoregressive models. An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables. Simulations demonstrate the effectiveness of the nroosed method.
基金Supported by the National Natural Science Foundation of China (Grant Nos.60375003 60972150)the Science and Technology Innovation Foundation of Northwestern Polytechnical University (Grant No.2007KJ01033)
文摘The general mutual information (GMI) and general conditional mutual information (GCMI) are considered to measure lag dependences in nonlinear time series. Both of the measures have the property of invariance with transform. The statistics based on GMI and GCMI are estimated using the correlation integral. Under the hypothesis of independent series, the estimators have Gaussian asymptotic distributions. Simulations applied to generated nonlinear series demonstrate that the methods appear to find frequently the correct lags.
基金supported by the Innovation Program for Quantum Science and Technology(2021ZD0301701)the National Natural Science Foundation of China(12175104).
文摘Correctly quantifying the direct association between variables based on observed data is a valuable topic to study.On the one hand,many traditional methods can only measure the linear direct association.On the other hand,certain existing measures of direct association between two variables suffer an instability problem when a parent variable has a strong influence on both variables.To solve these issues,we propose a measure,namely the independent conditional mutual information(ICMI),to quantify the direct association between two variables in a three-variable network.Additionally,we use simulation data to numerically compare the stability and reliability of the ICMI with those of other measures of direct association under different conditions.The numerical results show that ICMI performs more stably in many cases than the known measures such as unique information,conditional mutual information,and partial correlation.The statistical power results show that ICMI is more reliable for different forms of function.We further use our measure to analyze a network consisting of family finance,social security,and the residence of senior citizens.
基金supported by the National Natural Science Foundation of China(Grant No.60375003)the Chinese Aviation Foundation(Grant No.03153059).
文摘An information theory method is proposed to test the. Granger causality and contemporaneous conditional independence in Granger causality graph models. In the graphs, the vertex set denotes the component series of the multivariate time series, and the directed edges denote causal dependence, while the undirected edges reflect the instantaneous dependence. The presence of the edges is measured by a statistics based on conditional mutual information and tested by a permutation procedure. Furthermore, for the existed relations, a statistics based on the difference between general conditional mutual information and linear conditional mutual information is proposed to test the nonlinearity. The significance of the nonlinear test statistics is determined by a bootstrap method based on surrogate data. We investigate the finite sample behavior of the procedure through simulation time series with different dependence structures, including linear and nonlinear relations.