期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Model-Free Feature Screening via Maximal Information Coefficient (MIC) for Ultrahigh-Dimensional Multiclass Classification
1
作者 Tingting Chen Guangming Deng 《Open Journal of Statistics》 2023年第6期917-940,共24页
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit... It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates. 展开更多
关键词 Ultrahigh-Dimensional Feature Screening MODEL-FREE Maximal Information Coefficient (MIC) Multiclass Classification
在线阅读 下载PDF
Model-Free Feature Screening Based on Gini Impurity for Ultrahigh-Dimensional Multiclass Classification
2
作者 Zhongzheng Wang Guangming Deng 《Open Journal of Statistics》 2022年第5期711-732,共22页
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ... It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis. 展开更多
关键词 Ultrahigh-Dimensional Feature Screening MODEL-FREE Gini Impurity Multiclass Classification
在线阅读 下载PDF
Model-Free Ultra-High-Dimensional Feature Screening for Multi-Classified Response Data Based on Weighted Jensen-Shannon Divergence
3
作者 Qingqing Jiang Guangming Deng 《Open Journal of Statistics》 2023年第6期822-849,共28页
In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro... In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method. 展开更多
关键词 Ultra-High-Dimensional Multi-Classified Weighted Jensen-Shannon Divergence MODEL-FREE Feature Screening
在线阅读 下载PDF
Dynamic Conditional Feature Screening:A High-Dimensional Feature Selection Method Based on Mutual Information and Regression Error
4
作者 Yi Zhao Guangming Deng 《Open Journal of Statistics》 2025年第2期199-242,共44页
Current high-dimensional feature screening methods still face significant challenges in handling mixed linear and nonlinear relationships,controlling redundant information,and improving model robustness.In this study,... Current high-dimensional feature screening methods still face significant challenges in handling mixed linear and nonlinear relationships,controlling redundant information,and improving model robustness.In this study,we propose a Dynamic Conditional Feature Screening(DCFS)method tailored for high-dimensional economic forecasting tasks.Our goal is to accurately identify key variables,enhance predictive performance,and provide both theoretical foundations and practical tools for macroeconomic modeling.The DCFS method constructs a comprehensive test statistic by integrating conditional mutual information with conditional regression error differences.By introducing a dynamic weighting mechanism,DCFS adaptively balances the linear and nonlinear contributions of features during the screening process.In addition,a dynamic thresholding mechanism is designed to effectively control the false discovery rate(FDR),thereby improving the stability and reliability of the screening results.On the theoretical front,we rigorously prove that the proposed method satisfies the sure screening property and rank consistency,ensuring accurate identification of the truly important feature set in high-dimensional settings.Simulation results demonstrate that under purely linear,purely nonlinear,and mixed dependency structures,DCFS consistently outperforms classical screening methods such as SIS,CSIS,and IG-SIS in terms of true positive rate(TPR),false discovery rate(FDR),and rank correlation.These results highlight the superior accuracy,robustness,and stability of our method.Furthermore,an empirical analysis based on the U.S.FRED-MD macroeconomic dataset confirms the practical value of DCFS in real-world forecasting tasks.The experimental results show that DCFS achieves lower prediction errors(RMSE and MAE)and higher R2 values in forecasting GDP growth.The selected key variables-including the Industrial Production Index(IP),Federal Funds Rate,Consumer Price Index(CPI),and Money Supply(M2)-possess clear economic interpretability,offering reliable support for economic forecasting and policy formulation. 展开更多
关键词 High-Dimensional Feature Screening Conditional Mutual Information Regression Error Difference Dynamic Weighting Dynamic Thresholding Macroeconomic Forecasting FRED-MD Dataset
在线阅读 下载PDF
Sufficient dimension reduction in the presence of controlling variables 被引量:1
5
作者 Guoliang Fan Liping Zhu 《Science China Mathematics》 SCIE CSCD 2022年第9期1975-1996,共22页
We are concerned with partial dimension reduction for the conditional mean function in the presence of controlling variables.We suggest a profile least squares approach to perform partial dimension reduction for a gen... We are concerned with partial dimension reduction for the conditional mean function in the presence of controlling variables.We suggest a profile least squares approach to perform partial dimension reduction for a general class of semi-parametric models.The asymptotic properties of the resulting estimates for the central partial mean subspace and the mean function are provided.In addition,a Wald-type test is proposed to evaluate a linear hypothesis of the central partial mean subspace,and a generalized likelihood ratio test is constructed to check whether the nonparametric mean function has a specific parametric form.These tests can be used to evaluate whether there exist interactions between the covariates and the controlling variables,and if any,in what form.A Bayesian information criterion(BIC)-type criterion is applied to determine the structural dimension of the central partial mean subspace.Its consistency is also established.Numerical studies through simulations and real data examples are conducted to demonstrate the power and utility of the proposed semi-parametric approaches. 展开更多
关键词 central partial mean subspace controlling variable hypothesis test semi-parametric regression sufficient dimension reduction
原文传递
Optimal subsampling for principal component analysis
6
作者 Xuehu Zhu Weixuan Yuan +1 位作者 Zongben Xu Wenlin Dai 《Science China Mathematics》 2025年第12期2993-3016,共24页
Principal component analysis(PCA)is ubiquitous in statistics and machine learning domains.It is frequently used as an intermediate procedure in various regression and classification problems to reduce the dimensionali... Principal component analysis(PCA)is ubiquitous in statistics and machine learning domains.It is frequently used as an intermediate procedure in various regression and classification problems to reduce the dimensionality of datasets.However,as the size of datasets becomes extremely large,direct application of PCA may not be feasible since loading and storing massive datasets may exceed the computational ability of common machines.To address this problem,subsampling is usually performed,in which a small proportion of the data is used as a surrogate of the entire dataset.This paper proposes an A-optimal subsampling algorithm to decrease the computational cost of PCA for super-large datasets.To be more specific,we establish the consistency and asymptotic normality of the eigenvectors of the subsampled covariance matrix.Subsequently,we derive the optimal subsampling probabilities for PCA based on the A-optimality criterion.We validate the theoretical results by conducting extensive simulation studies.Moreover,the proposed subsampling algorithm for PCA is embedded into a classification procedure for handwriting data to assess its effectiveness in real-world applications. 展开更多
关键词 big data dimensionality reduction optimal subsampling principal component analysis
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部