摘要
基因表达数据的一个重要应用是给组织样本进行分类。在基因表达数据中,基因的数量相对于数据样本的个数通常比较多;也就是说,可以得到变量数(基因数)远远大于样本数的数据矩阵。过高的维数(变量或基因数)将给分类问题带来极大的挑战。本文提出结合一种新的特征提取方法——非相关线性判别式分析方法(ULDA)和支持向量机(SVM)分类算法,对结肠癌组织样本进行分类识别。并同其它方法作了比较研究,结果表明了该方法的可行性和有效性。
One important application of microarray gene expression data is classification of tissue samples. In gene expression da- ta, the number of genes is usually very high compared to the number of data samples; that is, we can obtain the data matrix with the number of variables (genes)far exceeding the number of samples. Too high dimension (the number of variables or genes) makes the task of classification quite challenging. This paper presents that a new feature extraction method ULDA and SVM are combined to classify colon tissue samples, Compared to other methods, the effect of classification is improved, the results prove the feasibility and effectiveness of this method,
出处
《计算机与现代化》
2008年第8期104-106,109,共4页
Computer and Modernization
关键词
非相关线性判别分析
支持向量机
基因表达谱
特征提取
分类
uncorrelated linear discriminant analysis
SVM
gene expression profiling
feature extraction
classification