期刊文献+

基于高权重词集的增量聚类算法研究 被引量:1

Chinese Text Clustering Research And Implementation
在线阅读 下载PDF
导出
摘要 文本聚类作为一种无监督的机器学习方法,已经成为对文本信息进行有效地组织、摘要和导航的重要手段,为越来越多的研究人员所关注。本文以网络论坛的话题发现和追踪为背景,通过对论坛中的帖子进行聚类分析而获取话题。本文以层次聚类算法为基础,进行改进,提出高权重词集的概念,基于此设计并实现了增量聚类算法,通过实验验证了该算法适应动态数据以及时间、空间复杂性上的优越性,证明了系统在设计的时候采用的系统架构的合理性及必要性。 As an unsupervised machine learning method, text clustering becomes an important means of organizing, abstracting and navigating text message, which draws more and more attention from researchers. This article takes the network forum's topic discovery and tracing as the background, through cluster analysis of the forum posts to access topics. This paper proposes a concept named high weight words collection and on the basis ofit, incremental clustering algorithm is improved from hierarchical clustering algorithm. Experimental results show that the algorithm can adapt to dynamic data as well as the superiority of time and space complexity. Besides, a certain number of text tests have proved the rationality and necessity in the design of the system architecture.
出处 《微计算机信息》 2011年第2期170-172,共3页 Control & Automation
关键词 文本聚类 高权重词集 层次聚类 增量聚类 text clustering high weight words collection hierarchical clustering algorithm incremental clustering
  • 相关文献

参考文献6

  • 1刘远超,王晓龙,徐志明,关毅.文档聚类综述[J].中文信息学报,2006,20(3):55-62. 被引量:65
  • 2G Sahon, AWong, C Yang. A vector space model for automatic indexing[J]. Communications of the ACM,1975,18(11):613-620.
  • 3焦慧,刘迁,王玉英,贾惠波.优化初始值的K均值中文文本聚类[J].微计算机信息,2009,25(21):142-144. 被引量:6
  • 4Silva, HB, Brito P, da Costa, JP. A partitional clustering algorithm validated by a clustering tendency index based on graph theory [J].Pattern Recognition,2006,39(5).
  • 5Dash, M, Liu, H '1 +1 >2": merging distance and density based clustering[A].7th International Conference on Database Systems for Advanced Applications (DASFAA 2001)[C].2001.
  • 6门国尊.用于信息检索的文本聚类技术[J].今日科苑,2008(20):165-165. 被引量:1

二级参考文献48

  • 1陈浩,何婷婷,姬东鸿.基于k-means聚类的无导词义消歧[J].中文信息学报,2005,19(4):10-16. 被引量:16
  • 2Han J.W.,Kamber M.Data mining concepts and Techniques[M].Beijing:China Machine Press,2001.
  • 3Jain A.K.,Murty M.N.,Flynn P.J.Data clustering:A review[J].ACM Computing Surveys,1999,31(3):265-281.
  • 4Michael Steinbaeh.A comparison of document clustering techniques[C].KDD'2000,Technical report of University of Minnesota,2000.
  • 5Salton G.,Wong A.,Yang C.S.On the spoeifieation of term values in automatic indexing[J].Journal of Documentation,1973,29(4):351-372.
  • 6史忠值.知识发现[M].北京:清华大学出版社,2002.21-56.
  • 7Regina Barzilay,Min-Yen Kan,and Kathleen R.McKeown.Simfinder:A Flexible Clustering Tool for Summarization[A].In proceedings of the Workshop on Summarization in NAACL 01[C].Pittsburg,Pennsylvania,USA:June 2001.
  • 8Zheng Chen,Wei-Ying Ma,Jinwen Ma.Learning to Cluster Web Search Results[A].In:proceedings of the 27th Annual International ACM SIGIR Conference[C].Sheffield,South Yorkshire,UK,July 2004,210 -217.
  • 9Y.C.Fang,S.Parthasarathy,F.Schwartz.Using Clustering to Boost Text Classification[J].In:proceedings of the IEEE ICDM Workshop on Text Mining,Maebashi City,Japan,2002.
  • 10A.Rauber,and M.Frühwirth.Automatically Analyzing and Organizing Music Archives[A].In:proceedings of the 5.European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2001)[C].Darmstadt,Germany,2001.

共引文献69

同被引文献4

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部