摘要
局部线性嵌入算法(locally linear embedding,LLE)是一种流形降维方法,在高维稀疏数据空间中,针对LLE不适合稀疏采样和欧氏距离公式的缺陷,研究该算法的扩展,引入核函数,并将样本映射到高维特征空间,核映射改善了样本的空间分布,改进的LLE方法在适当选取近邻点个数情况下,可得到良好的效果。对从高维采样数据中恢复得到低维数据集,通过本文提出的离群数据假设,并结合本文给出的离群聚类方法对所得低维数据是否是离群数据进行判别。仿真文验的结果表明了该方法能够有效地发现高维数据集中的离群点,与此同时,该算法具有参数估计简单、参数影响不大等优点,该算法为离群点检测问题的机器学习提供了一条新的途径。
Locally linear embedding (LLE) is one of the methods intended for dimensionality reduction. In the sparse data space of high dimension, its extension using kernel function and improved LLE for sparse sample is investigated. Using kernel function, the samples are mapped to high dimensional feature space, and are classified there. By kernel mapping , the distribution of samples is improved . When the number K of the nearest neighbors is selected, it can obtain good results. In this paper, we can transform nonlinear large-scale data into linear data in the feature space, and introduce a nonlinear data transformation to reduce data dimension. On the basis of outlier data hypothesis, outlier data is determined through the algorithm, which is called clustering with outliers detection. Simulation results illustrate that this algorithm is very efficient. Moreover, our method has the advantage of simple parameter estimation and low parameter sensitivity. Our method gives a new way for the solution of detection of outliers.
出处
《仪器仪表学报》
EI
CAS
CSCD
北大核心
2008年第9期1996-2000,共5页
Chinese Journal of Scientific Instrument
关键词
核函数
维数消减
非线性数据集
离群数据
聚类
kernel function
dimensionality reduction
nonlinear datasets
outliers
clustering