摘要
依据基因表达谱有效建立肿瘤分类模型的关键在于准确找出决定样本类别的一组特征基因·针对该问题,在分析肿瘤基因表达谱特征的基础上,研究了肿瘤分类特征基因选取问题·首先,提出了一种新的类别可分性判据以滤除分类无关基因,并采用支持向量机作为分类器进行特征基因分类性能的检验·然后,采用两两冗余分析及基于支持向量机分类模型的灵敏度分析法进行冗余基因的剔除·以急性白血病亚型分类特征基因选取为例进行实验,结果表明了上述方法的可行性和有效性·
Feature selection is an essential step to perform cancer classification with DNA microarrays, for there are a large number of genes from which to predict classes and a relatively small number of samples. This work addresses the problem of selection of a small subset of genes for classification from broad patterns of gene expression profiles by proposing a two-step feature selection method. The first step uses a new metric proposed in this paper as the criteria for class separability to remove the genes irrelevant to the classification task, and then a support vector machine with radial basis function kernel is applied to validate the classification performance of the genes selected for distinguishing different tissue types. The second step filters out the redundant genes by the sensitivity analysis based on the support vector machine classifier after pair-wise redundancy analysis. The two steps are applied to the gene expression profiles of human acute leukemia, and a better and more compact gene subset is obtained in contrast with the baseline method, which shows the feasibility and effectiveness of the method proposed.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2005年第10期1796-1801,共6页
Journal of Computer Research and Development
基金
国家自然科学基金重点项目(60234020)
关键词
特征选取
支持向量机
基因表达谱
肿瘤
feature selection
support vector machine
gene expression profiles
cancer