A data structure and function classification based method to evaluate clustering models for gene expression data

A data structure and function classification based method to evaluate clustering models for gene expression data

导出

摘要 Objective:To establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analyses of gene expression data. Methods: Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analyses were carried out by using 2 approaches. Firstly, to assess the predictive power of clusteringalgorithms, Entropy was introduced to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications. Secondly, a modified method of figure of merit (adjust-FOM) was used as internal assessment method. In this method, one clustering algorithm was used to analyze all data but one experimental condition, the remaining condition was used to assess the predictive power of the resulting clusters. This method was applied on 3 gene expression data sets (2 from the Lyer's Serum Data Sets, and 1 from the Ferea's Saccharomyces Cerevisiae Data Set). Results: A method based on entropy and figure of merit (FOM) was proposed to explore the results of the 3 data sets obtained by 6 different algorithms, SOM and Fuzzy clustering methods were confirmed to possess the highest ability to cluster. Conclusion: A method based on entropy is firstly brought forward to evaluate clustering analyses.Different results are attained in evaluating same data set due to different function classification. According to the curves of adjust_FOM and Entropy_FOM, SOM and Fuzzy clustering methods show the highest ability to cluster on the 3 data sets. Objective: To establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analyses of gene expression data. Methods: Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analyses were carried out by using 2 approaches. Firstly, to assess the predictive power of clustering algorithms, Entropy was introduced to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications. Secondly, a modified method of figure of merit (adjust-FOM) was used as internal assessment method. In this method, one clustering algorithm was used to analyze all data but one experimental condition, the remaining condition was used to assess the predictive power of the resulting clusters. This method was applied on 3 gene expression data sets (2 from the Lyer's Serum Data Sets, and 1 from the Ferea's Saccharomyces Cerevisiae Data Set). Results: A method based on entropy and figure of merit (FOM) was proposed to explore the results of the 3 data sets obtained by 6 different algorithms, SOM and Fuzzy clustering methods were confirmed to possess the highest ability to cluster. Conclusion: A method based on entropy is firstly brought forward to evaluate clustering analyses. Different results are attained in evaluating same data set due to different function classification. According to the curves of adjust-FOM and Entropy-FOM, SOM and Fuzzy clustering methods show the highest ability to cluster on the 3 data sets.

作者易东杨梦苏黄明辉李辉智王文昌

机构地区 Department of Medical Statistics Applied Research Centre for Genomics Technology Department of Electronic Technology

出处《Journal of Medical Colleges of PLA(China)》 CAS 2002年第4期312-317,共6页 中国人民解放军军医大学学报（英文版）

关键词 gene expression evaluation of clustering adjust- FOM ENTROPY 数据结构数据功能分类模型基因表达

分类号 R311 [医药卫生—基础医学]

引文网络
相关文献

参考文献12

1[1]Collins FS, Patrinos A, Jordan E et al. New goals for the U. S. Human Genome Project, 1998-2003 [J]. Science, 1998;282:682
2[2]Lipshutz RJ, Morris D, Chee M et al. Using oligonucleotide probe arrays to access genetic diversity [J]. Biotechniques, 1995; 19(3): 442
3[3]Schena M, Shalon D, Davis RW et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray [J]. Science,1995; 270(5235): 467
4[4]Schena M, Shalon D, Heller R et al. Related Articles, Nucleotide Parallel human genome analysis: Microarray-based expression monitoring of 1000genes [J]. Proc Natl Acad Sci USA, 1996; 93(20): 10 614
5[5]Heller RA, Schena M, Chai A et al. Discovery and analysis of inflammatory disease-related genes using cDNA microarrays [J]. Proc Natl Acad Sci USA, 1997; 94(6): 2150
6[6]Brazma A, Vilo J. Gene expression data analysis [J]. FEBS Letters, 2000;480:17
7[7]Geschwind DH. Sharing gene expression data: An array of options [J]. Nat Rev Neurosci, 2001; 2(6) :435
8[8]Mutch DM, Berger A, Mansourian R et al. Microarray data analysis: A practical approach for selecting differentially expressed genes [J]. Genome Biol, 2001; 2(12): PREPRINT0009
9[9]Yeung KY, Haynor DR, Ruzzo WL. Validating clustering for gene expression data [J]. Bioinformatics, 2001; 17(4): 309
10[11]Ferea TL. Systematic changes in gene expressionpatterns following adaptive evolution in yeast[J]. Proc Natl Acad Sci USA, 1999; 96 (17) :9721

1AUTOMATIC CLASSIFICATION OF ECG USING ARTIFICIAL NEURAL NETWORKS[J].Chinese Journal of Biomedical Engineering(English Edition),1996,5(3):135-138.
2张亚楠,韦多,刘琦,殷宝莉,宋小兵,张翠莲.卵母细胞滑面内质网聚集体与辅助生殖妊娠结局关系的研究进展[J].生殖与避孕,2015,35(10):709-714. 被引量：2
3葛秦生,田秦杰,叶丽珍,黄尚志.The classification, management and molecular biologic study of disorders of sexual differentiation[J].生殖医学杂志,1997,6(S1):11-13.
4Ziqian Chen,Ping Ni,Youqiang Ye,Hui Xiao,Gennian Qian,Shangwen Xu,Jingliang Wang,Xizhang Yang,Jinhua Chen,Zhenshan Shi,Biyun Zhang.Evaluating ischemic stroke with diffusion tensor imaging[J].福州总医院学报,2008,15(3):211-216. 被引量：6
5温天杨,王爱红,许樟荣.血小板凝胶的制备方法及其影响因素[J].中国组织工程研究,2013,17(8):1449-1454. 被引量：3
6惠培.PHOSPHORYLATION　OF　THE　MYELOID　ZINC　FINGER　PRO　TEIN　MZF－1　IS　DIFFERENTIALLY　REGULATED　DURING　MYELOPOIESIS[J].Journal of Pharmaceutical Analysis,1995,9(2):199-199.
7Zhi-hong NIU Lan XIA Yao WANG.Value of Evaluating Ovarian Response According to Basal FSH/LH Ratio[J].Journal of Reproduction and Contraception,2006,17(1):29-34. 被引量：2
8YANG Yang LI Kai-yang.Neural Network Based on GA-BP Algorithm and its Application in the Protein Secondary Structure Prediction[J].Chinese Journal of Biomedical Engineering(English Edition),2006,15(1):1-9. 被引量：8
9Li Zhang,Wei Wang,Ming Fan,Xiaoping Chen,Shuhong Liu,Liang Sun.Modified methods for culturing myoblasts of rats: Combination of multi-enzymatic digestion and double purification[J].Neural Regeneration Research,2007,2(1):1-5. 被引量：3
10R. FODIL,S. Féréol,E. PLANUS,V.M. LAURENT,B. LOUIS,D. ISABEY.Mechanical Properties of Living Adherent Cells :Relationship with Structure and Function[J].生物医学工程学杂志,2005,22(S1):9-10.

Journal of Medical Colleges of PLA(China)

2002年第4期

浏览历史

内容加载中请稍等...

A data structure and function classification based method to evaluate clustering models for gene expression data

参考文献12

相关作者

相关机构

相关主题

浏览历史