摘要
癌症基因表达数据具有高维、小样本的特点,对其进行维数约减十分有必要。传统的线性降维方法不能发现数据点之间的非线性关系,降维效果不好,因此,本文引入一种改进距离的多组权局部线性嵌入(DMLLE)算法对其进行降维。该算法采用一种改进距离来计算每个数据点的近邻点,为每一个近邻引入多组线性无关的局部权向量进行线性重构,通过最小化重构误差得到高维数据在低维空间的嵌入结果。实验结果表明,DMLLE算法对癌症基因表达数据有很好的降维效果。
Cancer gene expression data have the characteristics of high dimensionalities and small samples so it is necessary to perform dimensionality reduction of the data. Traditional linear dimensionality reduction approaches can not find the nonlinear relationship between the data points. In addition, they have bad dimensionality reduction results. Therefore a multiple weights locally linear embedding (LLE) algorithm with improved distance is introduced to perform dimensionality reduction in this study. We adopted an improved distance to calculate the neighbor of each data point in this algorithm, and then we introduced multiple sets of linearly independent local weight vectors for each neighbor, and obtained the embedding results in the low-dimensional space of the high-dimensional data by minimizing the reconstruction error. Experimental result showed that the multiple weights LLE algorithm with improved distance had good dimensionality reduction functions of the cancer gene expression data.
出处
《生物医学工程学杂志》
EI
CAS
CSCD
北大核心
2014年第1期85-90,共6页
Journal of Biomedical Engineering
关键词
非线性降维
局部线性嵌入
癌症基因表达数据
多组重构权
nonlinear dimensionality reduction
locally linear embedding
cancer gene expression data
multiple re-construction weights