摘要
基于机器学习方法寻找和发现新的胃癌亚型分类的相关基因,可以为探讨胃癌发生的分子机制及其基因水平的诊断和治疗提供标志和依据。试验选用33例中国人的胃癌Oligo基因芯片数据,数据包括13例弥漫型胃癌样本2、0例肠型胃癌样本,基因向量为21 378个。采用基因表达差异显著性分析方法(SAM)、偏最小二乘VIP系数法(PLS)和基于巴氏距离的顺序前向搜索方法(BD-SFS)结合的多步骤降维方法,提取到20个能将弥漫型样本和肠型样本有效分开的特征基因。这些特征基因基于支持向量机(SVM),分类准确率可达到89.43%;基于分层聚类分析,准确率可达到93.94%。同时,基因生物学意义的分析结果显示,所选的大部分标志基因对于人类恶性肿瘤的诊断和分型有很重要的意义。
Using machines learning methods to find new gastric cancer biomarkers provides us a standard and basis for exploring the molecular mechanisms of gastric cancer and the diagnosis and cure of gastric cancer from gene level. We employed 33 Oligo gene chips microarray dataset of Chinese, including 13 diffused gastric samples and 20 intestinal gastric samples. And each of the samples had 21378 genes. A hybrid method, including the significant analysis of microarrays (SAM), the partial least squares (PLS) and Bhattacharyya distance-sequence-forward search (BD-SFS), was used to reduce the dimensions of the data. 20 genes were selected as feature genes at last. The SVM classifier could distinguish diffused ones and intestinal ones well by using these 20 genes data. The accuracy rate reached 89.45%. And the classification accuracy rate of hierarchical clustering could reach at 93.94%. In addition, biological significance analysis showed that most of these 20 genes were important for the diagnosis and molecular classification of some human malignant tumors.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2009年第4期554-560,共7页
Chinese Journal of Biomedical Engineering
基金
国家自然科学基金资助项目(60234020)
关键词
胃癌
基因表达谱
标志基因
特征选择
gastric cancer
gene expression profile
marker genes
feature selection