Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from nume...Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications.展开更多
基金supported by Fundamental Research Program of Shanxi Province(Nos.202203021211088,202403021212254,202403021221109)Graduate Research Innovation Project in Shanxi Province(No.2024KY616).
文摘Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications.