期刊文献+

高维少样本数据的特征压缩 被引量:3

Feature reduction on high-dimensional small-sample data
在线阅读 下载PDF
导出
摘要 针对一类高维少样本数据的特点,给出了广义小样本概念,对广义小样本进行信息特征压缩:特征提取(降维)和特征选择(选维)。首先介绍基于主成分分析(PCA)的无监督与基于偏最小二乘(PLS)的有监督的特征提取方法;其次通过分析第一成分结构,提出基于PCA与PLS的新的全局特征选择方法,并进一步提出基于PLS的递归特征排除法(PLS-RFE);最后针对MITAML/ALL的分类问题,实现基于PCA与PLS的特征选择和特征提取,以及PLS-RFE特征选择与比较,达到广义小样本信息特征压缩的目的。 In view of the characteristics of small sample and high dimensional data,Generalized Small Samples(GSS) is defined. It reduces information feature of GSS:feature extraction(dimensionality extraction) and feature selection(dimensionality selection). Firstly,unsupervised feature extraction based on Principal Component Analysis(PCA) and supervised feature extraction based on Partial Least Squares(PLS) are introduced.Secondly,analyzing the structure of first PC,it presents new global PCA-based and PLSbased feature selection approaches,in addition recursive feature elimination on PLS(PLS-RFE) is realized.Finally,the approaches are applied to the classification of MIT AML/ALL,it performs feature extraction on PCA and PLS,and feature selection compared with PLS-RFE.The information compression of GSS is realized.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第36期165-169,共5页 Computer Engineering and Applications
基金 高校博士点专项科研基金(No20070384003) 福建省教育厅科技项目(NoJB08244)
关键词 广义小样本 主成分分析(PCA) 偏最小二乘(PLS) 特征提取 特征选择 generalized small sample Principal Component Analysis(PCA) Partial Least Squares(PLS) feature extraction feature selection
  • 相关文献

参考文献9

  • 1Colub T R,Slonim D K,Tamayo P.et al.Molecular classification of cancer:Class discovery and class prediction by gene expression monitoring[J].Science, 1999,286( 5439 ) : 531-537.
  • 2Nguyen D V,Rocke D M.Tnmor classifieation bypartial least squares using microarray gene expression data[J].Bioinformatics, 2002, 18( 1 ) : 39-50.
  • 3Nguyen D V,Roeke D M.Muhi-elass caneer classification via partial least squares with gene expression profites[J].Bioinformaties, 2002.18(9) : 1216-1226.
  • 4Nguyen D V,Roeke D M.On partial least squares dimension reduction for mieroarray-based classification: A simulation study[J]. Computational Statistics & Data Analysis,2004,46(9):407-425.
  • 5Guyon I,Weston J,Barnhill S,et al.Gene selection for cancer classification using support vector machines[J].Machine Learning,2000,46 ( 13):389-422.
  • 6阮晓钢,李颖新,李建更,龚道雄,王金莲.基于基因表达谱的肿瘤特异基因表达模式研究[J].中国科学(C辑),2006,36(1):86-96. 被引量:5
  • 7Massey W F.Principal components regression in exploratory statistical research[J].Journal of American Statistical Association, 1965,60: 234-246.
  • 8Wold S,Ruhe A,Wold H,et al.The collinearity problem in linear regression,the partial least squares (PLS) approach to generalized inverses[J].Journal of Statistics Computation, 1984,5:735-743.
  • 9Lorber A,Wangen L,Kowalski B.A theoretical foundation for the PLS algorithm[J].Journal of Chemometries, 1987,1 : 19-31.

二级参考文献23

  • 1邓林,马尽文,裴健.秩和基因选取方法及其在肿瘤诊断中的应用[J].科学通报,2004,49(13):1311-1316. 被引量:18
  • 2李颖新,刘全金,阮晓钢.一种肿瘤基因表达数据的知识提取方法[J].电子学报,2004,32(9):1479-1482. 被引量:13
  • 3Lu Y,Han J W.Cancer classification using gene expression data.Inform Syst,2003,28(4):243~268.
  • 4Guyon I,Weston J,Barnhill S,et al.Gene selection for cancer classification using support vector machines.Machine Learning,2000,46(13):389~422.
  • 5Kira K,Rendell L A.The feature selection problem:Traditional methods and a new algorithm.In:Swartout W,ed.Proceedings of the Tenth National Conference on Artificial Intelligence.San Jose,CA/Cambridge,MA:AAAI Press/The MIT Press,1992.129~134.
  • 6Kononenko I.Estimating attributes:Analysis and extensions of Relief.In:De Raedt L,Bergadano F,eds.Proceedings of European Conference on Machine Learning.Catania-Berlin:Springer-Verlag,1994.171~182.
  • 7Dash M,Liu H.Feature selection for classification.Intell Data Anal,1997,1(3):131~156.
  • 8Vapnik V N.Statistical Learning Theory.New York:Wiley Interscience,1998.
  • 9Maillet P,Bonnefoi H,Vaudan-Vutskits G,et al.Constitutional alterations of the ATM gene in early onset sporadic breast cancer.J Med Genet,2002,39(10):751~753.
  • 10Groth A,Lukas J,Nigg E A,et al.Human tousled like kinases are targeted by an ATM-and Chk1-dependent DNA damage checkpoint.EMBO J,2003,22(7):1676~1687.

共引文献4

同被引文献28

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部