摘要
提出一种基于语义显量子勾连模型和潜量子共现模型的无监督特征提取方法,解决了当前文本聚类不支持增量式和分布式计算的问题,为后续互联网环境下海量文本聚类、单文本摘要以及多文本摘要的发展奠定了基础.实验结果表明,该模型无需领域知识库的支持,在移走约96%的冗余信息后仍能保持较好的聚类效果.
This paper presents a new unsupervised feature extraction method based on the obvious quantum entangled model and the latent quantum co-occurrence model to solve the problems that current text clustering methods don't support incremental clustering and distributed computing,which is the foundation for the text clustering in Internet environment and single-and multi-text summary.The model without the support of domain knowledge maintains a good information clustering effect after moving ca 96% of the redundant features.Theory analysis and numerical experiments show that this model is effective.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2010年第1期79-84,共6页
Journal of Jilin University:Science Edition
基金
国家重点基础研究发展计划973项目基金(批准号:2004CB318000)
关键词
无监督
特征提取
勾连模型
窗函数
unsupervised
feature selection
entangling model
window function