摘要
特征选择是机器学习、模式识别和数据挖掘等领域数据预处理阶段的重要步骤。现实中采集的数据维度很高,存在大量冗余和噪声数据,这使得计算时间增加的同时还会对建模结果产生误导性。结合属性子集的广义重要度和智能优化runner-root算法提出一种特征选择算法,用runner-root算法进行迭代寻优,用属性子集的广义重要度和所选特征子集的大小作为适应度函数,对所选特征子集进行评估,尽可能在整个样本空间内搜索出对决策重要的特征子集。实验结果表明,该算法可以选择出有效的特征子集,使分类模型得到较高的准确率。
Feature selection is an important step in the data preprocessing stage in machine learning,pattern recognition,data mining and other fields.In reality,the data information collected is of high dimension,and there are redundant data and noisy data,which will increase the calculation time and mislead the modeling results at the same time.Combined with the generalized importance of attribute subsets and the intelligent optimization runner-root algorithm,a feature selection algorithm is proposed.The method uses the runner-root algorithm for iterative optimization,and uses the generalized importance of attribute subsets and the size of the selected feature subsets as fitness functions to evaluate the selected feature subsets,so that the features that are important for decision making are searched out as far as possible in the entire sample space.The experimental results show that the proposed feature selection algorithm can select effective feature subsets and obtain higher accuracy on the classification model.
作者
吴尚智
徐丹丹
王旭文
夏宁
WU Shang-zhi;XU Dan-dan;WANG Xu-wen;XIA Ning(College of Computer Science & Engineering,Northwest Normal University,Lanzhou 730070,China)
出处
《计算机工程与科学》
CSCD
北大核心
2022年第4期723-729,共7页
Computer Engineering & Science
基金
国家自然科学基金(61561043)
甘肃省自然科学基金(1010RJZA011)。