摘要
现有的文本特征选择方法都是串行化的,应用于海量文本数据集时时间效率较低,因此利用并行思想来提高文本特征选择的效率,已成为文本挖掘领域的一个研究热点.本文将遗传算法和并行协同进化算法结合起来,在粗糙集的基础上设计了一个并行协同进化遗传算法并将该算法用于文本特征选择.该方法采用遗传算法搜索特征,利用并行协同进化算法来提高时间效率,从而较快地获得较具代表性的特征子集.实验结果表明该方法是有效的.
Most of existing text feature selection methods are serial and are inefficient timely to be applied to Chinese massive text data sets. So, it is a hotspot of text mining how to improve efficiency of text feature selection by means of parallel thinking. Combining genetic algorithm with parallel collaborative evolutionary, a parallel collaborative evolutionary genetic algorithm (PCEGA) based on rough sets was designed and used to select text features. The presented method took advantage of genetic algorithm to select features and employed parallel collaborative evolutionary to enhance time efficiency, so that the more representative feature subsets was acquired quickly. Experimental results show that the method is effective.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2012年第10期2215-2220,共6页
Systems Engineering-Theory & Practice
基金
国家自然科学基金(12CGL004)
兰州交通大学青年科学研究基金(2011005)
关键词
特征选择
文本挖掘
遗传算法
协同进化
粗糙集
feature selection
text ming
genetic algorithm
collaborative evolutionary
rough sets