期刊文献+

异构环境下基于小样本学习的高维数据聚类 被引量:1

Clustering of High-Dimensional Data Based on Few-Shot Learning in Heterogeneous Environments
在线阅读 下载PDF
导出
摘要 异构环境中,不同数据源因在语义表达和表示形式上不同而存在语义鸿沟问题。而且异构环境下的数据是动态变化的,导致数据分布假设会随着时间的推移而失效。为提高异构环境中的聚类精度,提出基于小样本学习的高维数据聚类方法。首先,初步将相似性比较高的数据特征归并到一个类别,依据等距映射和稀疏系数矩阵定义特征重要度评分函数,根据各特征簇中的特征评分,选取得分最高的代表特征组成特征子集。然后,利用小样本学习无需对全部数据的复杂分布进行假设的优势,通过简化数据分布假设的方式,解决数据分布随时间变化而失效的问题。在标注少量样本后,利用深度三维神经网络获取投影函数,将高维数据特征转为低维语义表示。最后,通过计算嵌入空间中嵌入和类原型两者之间的关系分数学习语义共性,在聚类时将具有相似语义但不同语言表示的数据聚为一类,解决异构环境下的语义鸿沟问题,实现高维数据有效聚类。实验表明:该方法具有较好的高维数据聚类效果。 In heterogeneous environments,different data sources have the problem of semantic gap due to differences in semantic expressions and representation forms.Moreover,data in heterogeneous environments is dynamically changing,leading to the assumption of data distribution becoming invalid over time.To improve the clustering accuracy in heterogeneous environments,a high-dimensional data clustering method based on few-shot learning is proposed.Firstly,the data features with relatively high similarity are initially grouped into one category.The feature importance scoring function is defined based on the isometric mapping and sparse coefficient matrix.According to the feature scores in each feature cluster,the representative features with the highest score are selected to form the feature subset.Then,taking advantage ofthe fact that few-shot learning does not require assumptions about the complex distribution of all data,the problem of data distribution failure over time is solved by simplifying the assumptions of data distribution.After labeling a small number of samples,the projection function is obtained by using the deep three-dimensional neural network to convert the features of high-dimensional data into low-dimensional semantic representations.Finally,semantic commonalities are learned by calculating the relationship score between the embedding and the class prototype in the embedding space.When clustering,data with similar semantics but represented in different languages are grouped into one category to solve the semantic gap problem in heterogeneous environments and achieve effective clustering of high-dimensional data.Experiments show that this method has a better clustering effect on high-dimensional data.
作者 杨帆 刘璐 YANG Fan;LIU Lu(Shandong Taishan Pumped Storage Co.Ltd,Tai'an Shandong 271000,China)
出处 《计算机仿真》 2025年第11期262-265,346,共5页 Computer Simulation
基金 泰安市科技创新行动计划项目(2018D1396)。
关键词 异构环境 小样本学习 数据分布假设 高维数据 数据聚类 深度三维神经网络 Heterogeneous Environment Few-Shot Learning Data Distribution Assumption High-Dimensional Data Data Clustering Deep Three-Dimensional Neural Network
  • 相关文献

参考文献14

二级参考文献92

共引文献100

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部