摘要
基因微阵列数据通常包含大量与肿瘤分类无关的数据,会严重降低肿瘤诊断的准确率;基因微阵列数据还存在小样本、高维度的问题,也增加了肿瘤诊断的难度,所以必须对其进行基因选择。提出一种新的基于支持向量机(SVM)、联合递归特征去除(RFE)和序列前向选择(SFS)的基因选择方法。首先利用SVM计算每个基因的排序准则分数,再利用排序准则分数的一阶差分把基因划分为若干小组;对排序准则分数值最小的基因小组进行递归特征去除,消去噪声基因,同时对排序准则分数值最大的基因小组进行序列前向选择,选取有效信息基因。对白血病、结肠癌、乳腺癌基因微阵列数据的实验结果表明,所提出的方法运行效率高、分类性能好。
Microarray data usually contain a large quantity of irrelevant, noisy and redundant genes which may seriously deteriorate the prediction accuracy. In addition, microarray data often encounter problems of less samples and multi-dimensions,which raises many difficulties in cancer diagnosis. In this article,we proposed a new method for gene selection , combining recursive feature elimination ( RFE } and sequential forward selection ( SFS) based on support vector machine ( SVM ). The ranking score of each gene was calculated by using SVM. The information of first order difference of the ranking scores was used to divide the genes into some groups. The group with the smallest score was eliminated lwhile the group with the largest score was selected. Analysis results with real-life benchmark datasets of leukemia, colon, and breast demonstrate the high effectiveness and efficiency of the proposed method.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2010年第1期93-99,共7页
Chinese Journal of Biomedical Engineering
基金
教育部新世纪优秀人才支持计划项目(NECT-2005)
湖南省杰出青年基金项目(06JJ1010)
关键词
基因选择
支持向量机
递归特征去除
序列前向选择
gene selection
support vector machine
recursive feature elimination
sequential forward selection