期刊文献+

A data structure and function classification based method to evaluate clustering models for gene expression data

A data structure and function classification based method to evaluate clustering models for gene expression data
暂未订购
导出
摘要 Objective:To establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analyses of gene expression data. Methods: Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analyses were carried out by using 2 approaches. Firstly, to assess the predictive power of clusteringalgorithms, Entropy was introduced to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications. Secondly, a modified method of figure of merit (adjust-FOM) was used as internal assessment method. In this method, one clustering algorithm was used to analyze all data but one experimental condition, the remaining condition was used to assess the predictive power of the resulting clusters. This method was applied on 3 gene expression data sets (2 from the Lyer's Serum Data Sets, and 1 from the Ferea's Saccharomyces Cerevisiae Data Set). Results: A method based on entropy and figure of merit (FOM) was proposed to explore the results of the 3 data sets obtained by 6 different algorithms, SOM and Fuzzy clustering methods were confirmed to possess the highest ability to cluster. Conclusion: A method based on entropy is firstly brought forward to evaluate clustering analyses.Different results are attained in evaluating same data set due to different function classification. According to the curves of adjust_FOM and Entropy_FOM, SOM and Fuzzy clustering methods show the highest ability to cluster on the 3 data sets. Objective: To establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analyses of gene expression data. Methods: Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analyses were carried out by using 2 approaches. Firstly, to assess the predictive power of clustering algorithms, Entropy was introduced to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications. Secondly, a modified method of figure of merit (adjust-FOM) was used as internal assessment method. In this method, one clustering algorithm was used to analyze all data but one experimental condition, the remaining condition was used to assess the predictive power of the resulting clusters. This method was applied on 3 gene expression data sets (2 from the Lyer's Serum Data Sets, and 1 from the Ferea's Saccharomyces Cerevisiae Data Set). Results: A method based on entropy and figure of merit (FOM) was proposed to explore the results of the 3 data sets obtained by 6 different algorithms, SOM and Fuzzy clustering methods were confirmed to possess the highest ability to cluster. Conclusion: A method based on entropy is firstly brought forward to evaluate clustering analyses. Different results are attained in evaluating same data set due to different function classification. According to the curves of adjust-FOM and Entropy-FOM, SOM and Fuzzy clustering methods show the highest ability to cluster on the 3 data sets.
出处 《Journal of Medical Colleges of PLA(China)》 CAS 2002年第4期312-317,共6页 中国人民解放军军医大学学报(英文版)
关键词 gene expression evaluation of clustering adjust- FOM ENTROPY 数据结构 数据功能 分类 模型 基因表达
  • 相关文献

参考文献12

  • 1[1]Collins FS, Patrinos A, Jordan E et al. New goals for the U. S. Human Genome Project, 1998-2003 [J]. Science, 1998;282:682
  • 2[2]Lipshutz RJ, Morris D, Chee M et al. Using oligonucleotide probe arrays to access genetic diversity [J]. Biotechniques, 1995; 19(3): 442
  • 3[3]Schena M, Shalon D, Davis RW et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray [J]. Science,1995; 270(5235): 467
  • 4[4]Schena M, Shalon D, Heller R et al. Related Articles, Nucleotide Parallel human genome analysis: Microarray-based expression monitoring of 1000genes [J]. Proc Natl Acad Sci USA, 1996; 93(20): 10 614
  • 5[5]Heller RA, Schena M, Chai A et al. Discovery and analysis of inflammatory disease-related genes using cDNA microarrays [J]. Proc Natl Acad Sci USA, 1997; 94(6): 2150
  • 6[6]Brazma A, Vilo J. Gene expression data analysis [J]. FEBS Letters, 2000;480:17
  • 7[7]Geschwind DH. Sharing gene expression data: An array of options [J]. Nat Rev Neurosci, 2001; 2(6) :435
  • 8[8]Mutch DM, Berger A, Mansourian R et al. Microarray data analysis: A practical approach for selecting differentially expressed genes [J]. Genome Biol, 2001; 2(12): PREPRINT0009
  • 9[9]Yeung KY, Haynor DR, Ruzzo WL. Validating clustering for gene expression data [J]. Bioinformatics, 2001; 17(4): 309
  • 10[11]Ferea TL. Systematic changes in gene expressionpatterns following adaptive evolution in yeast[J]. Proc Natl Acad Sci USA, 1999; 96 (17) :9721

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部