Feature selection(FS)plays a crucial role in medical imaging by reducing dimensionality,improving computational efficiency,and enhancing diagnostic accuracy.Traditional FS techniques,including filter,wrapper,and embed...Feature selection(FS)plays a crucial role in medical imaging by reducing dimensionality,improving computational efficiency,and enhancing diagnostic accuracy.Traditional FS techniques,including filter,wrapper,and embedded methods,have been widely used but often struggle with high-dimensional and heterogeneous medical imaging data.Deep learning-based FS methods,particularly Convolutional Neural Networks(CNNs)and autoencoders,have demonstrated superior performance but lack interpretability.Hybrid approaches that combine classical and deep learning techniques have emerged as a promising solution,offering improved accuracy and explainability.Furthermore,integratingmulti-modal imaging data(e.g.,MagneticResonance Imaging(MRI),ComputedTomography(CT),Positron Emission Tomography(PET),and Ultrasound(US))poses additional challenges in FS,necessitating advanced feature fusion strategies.Multi-modal feature fusion combines information fromdifferent imagingmodalities to improve diagnostic accuracy.Recently,quantum computing has gained attention as a revolutionary approach for FS,providing the potential to handle high-dimensional medical data more efficiently.This systematic literature review comprehensively examines classical,Deep Learning(DL),hybrid,and quantum-based FS techniques inmedical imaging.Key outcomes include a structured taxonomy of FS methods,a critical evaluation of their performance across modalities,and identification of core challenges such as computational burden,interpretability,and ethical considerations.Future research directions—such as explainable AI(XAI),federated learning,and quantum-enhanced FS—are also emphasized to bridge the current gaps.This review provides actionable insights for developing scalable,interpretable,and clinically applicable FS methods in the evolving landscape of medical imaging.展开更多
Feature selection is always an important issue in the visual SLAM (simultaneous location and mapping) literature. Considering that the location estimation can be improved by tracking features with larger value of vi...Feature selection is always an important issue in the visual SLAM (simultaneous location and mapping) literature. Considering that the location estimation can be improved by tracking features with larger value of visible time, a new feature selection method based on motion estimation is proposed. First, a k-step iteration algorithm is presented for visible time estimation using an affme motion model; then a delayed feature detection method is introduced for efficiently detecting features with the maximum visible time. As a means of validation for the proposed method, both simulation and real data experiments are carded out. Results show that the proposed method can improve both the estimation performance and the computational performance compared with the existing random feature selection method.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Accurate forecasting of renewable power generation is crucial for grid stability and cost efficiency.Feature se-lection in AI-based forecasting remains challenging due to high data acquisition cost,lack of transparenc...Accurate forecasting of renewable power generation is crucial for grid stability and cost efficiency.Feature se-lection in AI-based forecasting remains challenging due to high data acquisition cost,lack of transparency,and limited user control.We introduce a transparent and cost-sensitive feature selection framework for renewable power forecasting that leverages Explainable Artificial Intelligence(XAI).We integrate SHapley Additive ex-Planations(SHAP)and Explain Like I’m 5(ELI5)to identify dominant and redundant features.This approach enables systematic dataset reduction without compromising model performance.Our case study,based on Photovoltaic(PV)generation data,evaluates the approach across four experimental setups.Experimental results indicate that our XAI-based feature selection reduces the dominance index from 0.37 to 0.17,maintains high predictive accuracy(R^(2)=0.94,drop<0.04),and lowers data acquisition costs.Furthermore,eliminating dominant features improves robustness to noise and reduces performance variance by a factor of three compared to the baseline scenario.The developed framework enhances interpretability,supports human-in-the-loop de-cisionmaking,and introduces a cost-sensitive objective function for feature selection.By combining trans-parency,robustness,and efficiency,we contribute to the development and implementation of Trustworthy AI(TAI)applications in energy forecasting,providing a scalable solution for industrial deployment.展开更多
Feature selection is a crucial problem in efficient machine learning,and it also greatly contributes to the explainability of machine-driven decisions.Methods,like decision trees and Least Absolute Shrinkage and Selec...Feature selection is a crucial problem in efficient machine learning,and it also greatly contributes to the explainability of machine-driven decisions.Methods,like decision trees and Least Absolute Shrinkage and Selection Operator(LASSO),can select features during training.However,these embedded approaches can only be applied to a small subset of machine learning models.Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost.To enhance their efficiency,many randomized algorithms have been designed.In this paper,we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection.We conduct theoretical computational complexity analysis and further explain our algorithms’generic parallelizability.We conduct experiments on both synthetic and real datasets with different machine learning base models.Results show that,compared with existing approaches,our proposed techniques can locate a more meaningful set of features with a high efficiency.展开更多
对于二类目标特征选择问题,首先讨论了特征空间的线性可分性问题,并给出了其判别条件;其次,通过借鉴支撑矢量机原理,分析了特征可分性判据的基本性质;最后,依据各特征对分类间隔的贡献大小定义了特征有效率,并以此进行特征选择和特征空...对于二类目标特征选择问题,首先讨论了特征空间的线性可分性问题,并给出了其判别条件;其次,通过借鉴支撑矢量机原理,分析了特征可分性判据的基本性质;最后,依据各特征对分类间隔的贡献大小定义了特征有效率,并以此进行特征选择和特征空间降维.实测数据与网络公开UCI(University of california,Irvine)数据库的实验结果表明,与经典的Relief特征选择算法相比,该算法在识别性能和推广能力上明显有所提高.展开更多
Railway transportation plays an important role in modern society. As China's massive railway transportation network continues to grow in total mileage and operation density, the energy consumption of trains become...Railway transportation plays an important role in modern society. As China's massive railway transportation network continues to grow in total mileage and operation density, the energy consumption of trains becomes a serious concern. For any given route, the geographic characteristics are known a priori, but the parameters(e.g., loading and marshaling) of trains vary from one trip to another. An extensive analysis of the train operation data suggests that the control gear operation of trains is the most important factor that affects the energy consumption. Such an observation determines that the problem of energy-efficient train driving has to be addressed by considering both the geographic information and the trip parameters. However, the problem is difficult to solve due to its high dimension, nonlinearity, complex constraints, and time-varying characteristics. Faced with these difficulties, we propose an energy-efficient train control framework based on a hierarchical ensemble learning approach. Through hierarchical refinement, we learn prediction models of speed and gear. The learned models can be used to derive optimized driving operations under real-time requirements. This study uses random forest and bagging – REPTree as classification algorithm and regression algorithm, respectively. We conduct an extensive study on the potential of bagging, decision trees, random forest, and feature selection to design an effective hierarchical ensemble learning framework. The proposed framework was testified through simulation. The average energy consumption of the proposed method is over 7% lower than that of human drivers.展开更多
Perovskite solar cells(PSCs)have achieved remarkable advancements in recent years[1].Devices achieving high power conversion efficiencies(PCEs)typically rely on molecular contacts featuring conjugated cores[2].The pla...Perovskite solar cells(PSCs)have achieved remarkable advancements in recent years[1].Devices achieving high power conversion efficiencies(PCEs)typically rely on molecular contacts featuring conjugated cores[2].The planar and conjugated cores facilitate ordered molecular stacking throughπ-πinteractions,thereby enhancing charge transport and selectivity[3,4].展开更多
文摘Feature selection(FS)plays a crucial role in medical imaging by reducing dimensionality,improving computational efficiency,and enhancing diagnostic accuracy.Traditional FS techniques,including filter,wrapper,and embedded methods,have been widely used but often struggle with high-dimensional and heterogeneous medical imaging data.Deep learning-based FS methods,particularly Convolutional Neural Networks(CNNs)and autoencoders,have demonstrated superior performance but lack interpretability.Hybrid approaches that combine classical and deep learning techniques have emerged as a promising solution,offering improved accuracy and explainability.Furthermore,integratingmulti-modal imaging data(e.g.,MagneticResonance Imaging(MRI),ComputedTomography(CT),Positron Emission Tomography(PET),and Ultrasound(US))poses additional challenges in FS,necessitating advanced feature fusion strategies.Multi-modal feature fusion combines information fromdifferent imagingmodalities to improve diagnostic accuracy.Recently,quantum computing has gained attention as a revolutionary approach for FS,providing the potential to handle high-dimensional medical data more efficiently.This systematic literature review comprehensively examines classical,Deep Learning(DL),hybrid,and quantum-based FS techniques inmedical imaging.Key outcomes include a structured taxonomy of FS methods,a critical evaluation of their performance across modalities,and identification of core challenges such as computational burden,interpretability,and ethical considerations.Future research directions—such as explainable AI(XAI),federated learning,and quantum-enhanced FS—are also emphasized to bridge the current gaps.This review provides actionable insights for developing scalable,interpretable,and clinically applicable FS methods in the evolving landscape of medical imaging.
文摘Feature selection is always an important issue in the visual SLAM (simultaneous location and mapping) literature. Considering that the location estimation can be improved by tracking features with larger value of visible time, a new feature selection method based on motion estimation is proposed. First, a k-step iteration algorithm is presented for visible time estimation using an affme motion model; then a delayed feature detection method is introduced for efficiently detecting features with the maximum visible time. As a means of validation for the proposed method, both simulation and real data experiments are carded out. Results show that the proposed method can improve both the estimation performance and the computational performance compared with the existing random feature selection method.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
文摘Accurate forecasting of renewable power generation is crucial for grid stability and cost efficiency.Feature se-lection in AI-based forecasting remains challenging due to high data acquisition cost,lack of transparency,and limited user control.We introduce a transparent and cost-sensitive feature selection framework for renewable power forecasting that leverages Explainable Artificial Intelligence(XAI).We integrate SHapley Additive ex-Planations(SHAP)and Explain Like I’m 5(ELI5)to identify dominant and redundant features.This approach enables systematic dataset reduction without compromising model performance.Our case study,based on Photovoltaic(PV)generation data,evaluates the approach across four experimental setups.Experimental results indicate that our XAI-based feature selection reduces the dominance index from 0.37 to 0.17,maintains high predictive accuracy(R^(2)=0.94,drop<0.04),and lowers data acquisition costs.Furthermore,eliminating dominant features improves robustness to noise and reduces performance variance by a factor of three compared to the baseline scenario.The developed framework enhances interpretability,supports human-in-the-loop de-cisionmaking,and introduces a cost-sensitive objective function for feature selection.By combining trans-parency,robustness,and efficiency,we contribute to the development and implementation of Trustworthy AI(TAI)applications in energy forecasting,providing a scalable solution for industrial deployment.
基金supported in part by the National Science Foundation(NSF)(Nos.1447711,1743418,and 1843025)
文摘Feature selection is a crucial problem in efficient machine learning,and it also greatly contributes to the explainability of machine-driven decisions.Methods,like decision trees and Least Absolute Shrinkage and Selection Operator(LASSO),can select features during training.However,these embedded approaches can only be applied to a small subset of machine learning models.Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost.To enhance their efficiency,many randomized algorithms have been designed.In this paper,we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection.We conduct theoretical computational complexity analysis and further explain our algorithms’generic parallelizability.We conduct experiments on both synthetic and real datasets with different machine learning base models.Results show that,compared with existing approaches,our proposed techniques can locate a more meaningful set of features with a high efficiency.
文摘对于二类目标特征选择问题,首先讨论了特征空间的线性可分性问题,并给出了其判别条件;其次,通过借鉴支撑矢量机原理,分析了特征可分性判据的基本性质;最后,依据各特征对分类间隔的贡献大小定义了特征有效率,并以此进行特征选择和特征空间降维.实测数据与网络公开UCI(University of california,Irvine)数据库的实验结果表明,与经典的Relief特征选择算法相比,该算法在识别性能和推广能力上明显有所提高.
基金sponsored in part by the National Natural Science Foundation of China(Nos.61872217 and 61527812)Industrial Internet Innovation&Development Project of Ministry of Industry and Information Technology of China+2 种基金National Science and Technology Major Project(No.2016ZX01038101)MIIT IT funds(Research and Application of TCN Key Technologiezs)of Chinathe National Key Technology R&D Program(No.2015BAG14B01-02)
文摘Railway transportation plays an important role in modern society. As China's massive railway transportation network continues to grow in total mileage and operation density, the energy consumption of trains becomes a serious concern. For any given route, the geographic characteristics are known a priori, but the parameters(e.g., loading and marshaling) of trains vary from one trip to another. An extensive analysis of the train operation data suggests that the control gear operation of trains is the most important factor that affects the energy consumption. Such an observation determines that the problem of energy-efficient train driving has to be addressed by considering both the geographic information and the trip parameters. However, the problem is difficult to solve due to its high dimension, nonlinearity, complex constraints, and time-varying characteristics. Faced with these difficulties, we propose an energy-efficient train control framework based on a hierarchical ensemble learning approach. Through hierarchical refinement, we learn prediction models of speed and gear. The learned models can be used to derive optimized driving operations under real-time requirements. This study uses random forest and bagging – REPTree as classification algorithm and regression algorithm, respectively. We conduct an extensive study on the potential of bagging, decision trees, random forest, and feature selection to design an effective hierarchical ensemble learning framework. The proposed framework was testified through simulation. The average energy consumption of the proposed method is over 7% lower than that of human drivers.
文摘Perovskite solar cells(PSCs)have achieved remarkable advancements in recent years[1].Devices achieving high power conversion efficiencies(PCEs)typically rely on molecular contacts featuring conjugated cores[2].The planar and conjugated cores facilitate ordered molecular stacking throughπ-πinteractions,thereby enhancing charge transport and selectivity[3,4].