期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
Model-Free Ultra-High-Dimensional Feature Screening for Multi-Classified Response Data Based on Weighted Jensen-Shannon Divergence
1
作者 Qingqing Jiang Guangming Deng 《Open Journal of Statistics》 2023年第6期822-849,共28页
In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro... In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method. 展开更多
关键词 Ultra-High-Dimensional Multi-Classified Weighted Jensen-Shannon Divergence MODEL-FREE feature screening
在线阅读 下载PDF
Model-Free Feature Screening Based on Gini Impurity for Ultrahigh-Dimensional Multiclass Classification
2
作者 Zhongzheng Wang Guangming Deng 《Open Journal of Statistics》 2022年第5期711-732,共22页
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ... It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis. 展开更多
关键词 Ultrahigh-Dimensional feature screening MODEL-FREE Gini Impurity Multiclass Classification
在线阅读 下载PDF
Model-Free Feature Screening via Maximal Information Coefficient (MIC) for Ultrahigh-Dimensional Multiclass Classification
3
作者 Tingting Chen Guangming Deng 《Open Journal of Statistics》 2023年第6期917-940,共24页
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit... It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates. 展开更多
关键词 Ultrahigh-Dimensional feature screening MODEL-FREE Maximal Information Coefficient (MIC) Multiclass Classification
在线阅读 下载PDF
Dynamic Conditional Feature Screening:A High-Dimensional Feature Selection Method Based on Mutual Information and Regression Error
4
作者 Yi Zhao Guangming Deng 《Open Journal of Statistics》 2025年第2期199-242,共44页
Current high-dimensional feature screening methods still face significant challenges in handling mixed linear and nonlinear relationships,controlling redundant information,and improving model robustness.In this study,... Current high-dimensional feature screening methods still face significant challenges in handling mixed linear and nonlinear relationships,controlling redundant information,and improving model robustness.In this study,we propose a Dynamic Conditional Feature Screening(DCFS)method tailored for high-dimensional economic forecasting tasks.Our goal is to accurately identify key variables,enhance predictive performance,and provide both theoretical foundations and practical tools for macroeconomic modeling.The DCFS method constructs a comprehensive test statistic by integrating conditional mutual information with conditional regression error differences.By introducing a dynamic weighting mechanism,DCFS adaptively balances the linear and nonlinear contributions of features during the screening process.In addition,a dynamic thresholding mechanism is designed to effectively control the false discovery rate(FDR),thereby improving the stability and reliability of the screening results.On the theoretical front,we rigorously prove that the proposed method satisfies the sure screening property and rank consistency,ensuring accurate identification of the truly important feature set in high-dimensional settings.Simulation results demonstrate that under purely linear,purely nonlinear,and mixed dependency structures,DCFS consistently outperforms classical screening methods such as SIS,CSIS,and IG-SIS in terms of true positive rate(TPR),false discovery rate(FDR),and rank correlation.These results highlight the superior accuracy,robustness,and stability of our method.Furthermore,an empirical analysis based on the U.S.FRED-MD macroeconomic dataset confirms the practical value of DCFS in real-world forecasting tasks.The experimental results show that DCFS achieves lower prediction errors(RMSE and MAE)and higher R2 values in forecasting GDP growth.The selected key variables-including the Industrial Production Index(IP),Federal Funds Rate,Consumer Price Index(CPI),and Money Supply(M2)-possess clear economic interpretability,offering reliable support for economic forecasting and policy formulation. 展开更多
关键词 High-Dimensional feature screening Conditional Mutual Information Regression Error Difference Dynamic Weighting Dynamic Thresholding Macroeconomic Forecasting FRED-MD Dataset
在线阅读 下载PDF
Model-free feature screening for high-dimensional survival data 被引量:3
5
作者 Yuanyuan Lin Xianhui Liu Meiling Hao 《Science China Mathematics》 SCIE CSCD 2018年第9期1617-1636,共20页
With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introd... With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set. 展开更多
关键词 feature screening random censoring robustness sure independence screening ultra-high dimension
原文传递
Stable correlation and robust feature screening 被引量:2
6
作者 Xu Guo Runze Li +1 位作者 Wanjun Liu Lixing Zhu 《Science China Mathematics》 SCIE CSCD 2022年第1期153-168,共16页
In this paper,we propose a new correlation,called stable correlation,to measure the dependence between two random vectors.The new correlation is well defined without the moment condition and is zero if and only if the... In this paper,we propose a new correlation,called stable correlation,to measure the dependence between two random vectors.The new correlation is well defined without the moment condition and is zero if and only if the two random vectors are independent.We also study its other theoretical properties.Based on the new correlation,we further propose a robust model-free feature screening procedure for ultrahigh dimensional data and establish its sure screening property and rank consistency property without imposing the subexponential or sub-Gaussian tail condition,which is commonly required in the literature of feature screening.We also examine the finite sample performance of the proposed robust feature screening procedure via Monte Carlo simulation studies and illustrate the proposed procedure by a real data example. 展开更多
关键词 feature screening nonlinear dependence stable correlation sure screening property
原文传递
Feature Screening for High-Dimensional Survival Data via Censored Quantile Correlation 被引量:1
7
作者 XU Kai HUANG Xudong 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2021年第3期1207-1224,共18页
This paper proposes a new sure independence screening procedure for high-dimensional survival data based on censored quantile correlation(CQC).This framework has two distinctive features:1)Via incorporating a weightin... This paper proposes a new sure independence screening procedure for high-dimensional survival data based on censored quantile correlation(CQC).This framework has two distinctive features:1)Via incorporating a weighting scheme,our metric is a natural extension of quantile correlation(QC),considered by Li(2015),to handle high-dimensional survival data;2)The proposed method not only is robust against outliers,but also can discover the nonlinear relationship between independent variables and censored dependent variable.Additionally,the proposed method enjoys the sure screening property under certain technical conditions.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors. 展开更多
关键词 Censored quantile correlation feature screening high-dimensional survival data rank consistency property sure screening property
原文传递
A fast, accurate and dense feature matching algorithm for aerial images 被引量:2
8
作者 LI Ying GONG Guanghong SUN Lin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第6期1128-1139,共12页
Three-dimensional(3D)reconstruction based on aerial images has broad prospects,and feature matching is an important step of it.However,for high-resolution aerial images,there are usually problems such as long time,mis... Three-dimensional(3D)reconstruction based on aerial images has broad prospects,and feature matching is an important step of it.However,for high-resolution aerial images,there are usually problems such as long time,mismatching and sparse feature pairs using traditional algorithms.Therefore,an algorithm is proposed to realize fast,accurate and dense feature matching.The algorithm consists of four steps.Firstly,we achieve a balance between the feature matching time and the number of matching pairs by appropriately reducing the image resolution.Secondly,to realize further screening of the mismatches,a feature screening algorithm based on similarity judgment or local optimization is proposed.Thirdly,to make the algorithm more widely applicable,we combine the results of different algorithms to get dense results.Finally,all matching feature pairs in the low-resolution images are restored to the original images.Comparisons between the original algorithms and our algorithm show that the proposed algorithm can effectively reduce the matching time,screen out the mismatches,and improve the number of matches. 展开更多
关键词 feature matching feature screening feature fusion aerial image three-dimensional(3D)reconstruction
在线阅读 下载PDF
Improving the performance of machine learning algorithms for detection of individual pests and beneficial insects using feature selection techniques
9
作者 Rabiu Aminu Samantha M.Cook +2 位作者 David Ljungberg Oliver Hensel Abozar Nasirahmadi 《Artificial Intelligence in Agriculture》 2025年第3期377-394,共18页
To reduce damage caused by insect pests,farmers use insecticides to protect produce from crop pests.This practice leads to high synthetic chemical usage because a large portion of the applied insecticide does not reac... To reduce damage caused by insect pests,farmers use insecticides to protect produce from crop pests.This practice leads to high synthetic chemical usage because a large portion of the applied insecticide does not reach its intended target;instead,it may affect non-target organisms and pollute the environment.One approach to mitigating this is through the selective application of insecticides to only those crop plants(or patches of plants)where the insect pests are located,avoiding non-targets and beneficials.The first step to achieve this is the identification of insects on plants and discrimination between pests and beneficial non-targets.However,detecting small-sized individual insects is challenging using image-based machine learning techniques,especially in natural field settings.This paper proposes a method based on explainable artificial intelligence feature selection and machine learning to detect pests and beneficial insects in field crops.An insect-plant dataset reflecting real field conditions was created.It comprises two pest insects—the Colorado potato beetle(CPB,Leptinotarsa decemlineata)and green peach aphid(Myzus persicae)—and the beneficial seven-spot ladybird(Coccinella septempunctata).The specialist herbivore CPB was imaged only on potato plants(Solanum tuberosum)while green peach aphids and seven-spot ladybirds were imaged on three crops:potato,faba bean(Vicia faba),and sugar beet(Beta vulgaris subsp.vulgaris).This increased dataset diversity,broadening the potential application of the developed method for discriminating between pests and beneficial insects in several crops.The insects were imaged in both laboratory and outdoor settings.Using the GrabCut algorithm,regions of interest in the image were identified before shape,texture,and colour features were extracted from the segmented regions.The concept of explainable artificial intelligence was adopted by incorporating permutation feature importance ranking and Shapley Additive explanations values to identify the feature set that optimized a model's performance while reducing computational complexity.The proposed explainable artificial intelligence feature selection method was compared to conventional feature selection techniques,including mutual information,chi-square coefficient,maximal information coefficient,Fisher separation criterion and variance thresholding.Results showed improved accuracy(92.62%Random forest,90.16%Support vector machine,83.61%K-nearest neighbours,and 81.97%Naïve Bayes)and a reduction in the number of model parameters and memory usage(7.22×10^(7)Random forest,6.23×10^(3)Support vector machine,3.64×10^(4)K-nearest neighbours and 1.88×10^(2)Naïve Bayes)compared to using all features.Prediction and training times were also reduced by approximately half compared to conventional feature selection techniques.This demonstrates a simple machine learning algorithm combined with an ideal feature selection methodology can achieve robust performance comparable to other methods.With feature selection,model performance can be maximized and hardware requirements reduced,which are essential for real-world applications with resource constraints.This research offers a reliable approach towards automatic detection and discrimination of pest and beneficial insects which will facilitate the development of alternative pest control approaches and other targeted pest removal methods that are less harmful to the environment than the broad-scale application of synthetic insecticides. 展开更多
关键词 feature screening Explainable artificial intelligence Targeted pest control Sustainable agriculture
原文传递
Variable screening with missing covariates: a discussion of ‘statistical inferencefor nonignorable missing data problems: a selective review’ by NianshengTang and Yuanyuan Ju
10
作者 Fang Fang Lyu Ni 《Statistical Theory and Related Fields》 2018年第2期134-136,共3页
Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature scr... Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature screening with missing covariates. 展开更多
关键词 feature screening missing at random missing covariates
原文传递
A review of distributed statistical inference 被引量:2
11
作者 Yuan Gao Weidong Liu +3 位作者 Hansheng Wang Xiaozhou Wang Yibo Yan Riquan Zhang 《Statistical Theory and Related Fields》 2022年第2期89-99,共11页
The rapid emergence of massive datasets in various fields poses a serious challenge to tra-ditional statistical methods.Meanwhile,it provides opportunities for researchers to develop novel algorithms.Inspired by the i... The rapid emergence of massive datasets in various fields poses a serious challenge to tra-ditional statistical methods.Meanwhile,it provides opportunities for researchers to develop novel algorithms.Inspired by the idea of divide-and-conquer,various distributed frameworks for statistical estimation and inference have been proposed.They were developed to deal with large-scale statistical optimization problems.This paper aims to provide a comprehensive review for related literature.It includes parametric models,nonparametric models,and other frequently used models.Their key ideas and theoretical properties are summarized.The trade-off between communication cost and estimate precision together with other concerns is discussed. 展开更多
关键词 Distributed computing DIVIDE-AND-CONQUER communication-efficiency shrinkage methods nonparametric estimation principal component analysis feature screening BOOTSTRAP
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部