期刊文献+

一种基于自动阈值发现的文本聚类方法 被引量:16

A Text Clustering Method Based on Auto-Selected Threshold
在线阅读 下载PDF
导出
摘要 文本聚类随着网上文本的激增以及实际应用中的需求 ,引起了人们越来越多的重视 通过分析文本的特征以及常用的文本聚类方法 ,提出了一种对文本进行细致划分获取细化簇、并在细化簇基础上进行聚类的文本聚类方法 在聚类过程中 ,采用曲线的多项式拟合技术提出了一种自动发现阈值的方法 ,并把该方法应用于细化簇的寻找步骤中 与凝聚的层次聚类方法的实验比较结果表明 ,使用自动阈值发现的方法在时间消耗、聚类效果。 Text clustering is becoming more and more popular due to the increasing of texts on Web and the requirements in real application In this paper a novel text clustering method is proposed, in which cluster texts are clustered into fine clusters firstly, and then the fine clusters are clustered using agglomerative nesting clustering method A method that can select threshold automatically in clustering process is also proposed, based on multinomial simulation technique This method is applied in the clustering algorithm The experiments show that the algorithm adopted has a good result in computational complexity, clustering effect and tolerance of outliers
出处 《计算机研究与发展》 EI CSCD 北大核心 2004年第10期1748-1753,共6页 Journal of Computer Research and Development
基金 国家自然科学基金项目 ( 60 173 0 5 1)
关键词 文本聚类 细化簇 自动阈值发现 text clustering fine clusters auto-selected threshold
  • 相关文献

参考文献8

  • 1J MacQueen. Some methods for classification and analysis of multivariate observation. In: Proc of the 5th Berkeley Symp Math Statist and Prob 1. California; University of California Press,1967. 281~297
  • 2L Kaufman, P J Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons,1990
  • 3M Ankerst, M M Breunig, H P Kriegel, et al. OPTICS:Ordering points to identify the clustering structure. In: Proc of the 1999 ACM SIGMOD Int'l Conf on Management of Data (SIGMOD' 99). New York: ACM Press, 1999. 164~169
  • 4苏中,马少平,杨强,张宏江.基于Web-Log Mining的Web文档聚类[J].软件学报,2002,13(1):99-104. 被引量:29
  • 5A Hotho, G Stumme. Conceptual clustering of text clusters.FGML Workshop, Hannover, 2002
  • 6D S Modha, W S Spangler. Feature weighting in k-means clustering. Machine Learning, 2003, 52(3): 217~237
  • 7F Beil, M Ester, X Xu. Frequent term-based text clustering. In:Proc of 2002 Int Conf Knowledge Discovery and Data Mining.New York: ACM Press, 2002. 436~442
  • 8B B Wang, R I McKay, Hussein AAbbass, etal. A comparative study for domain ontology guided feature extraction. In: Proc of 26th Australian Computer Science Conference (ACSC2003).Darlinghurst, Australia: Australian Computer Society Inc, 2003.69~ 78

二级参考文献6

  • 1Ng, R., Han, J. Efficient and effective clustering methods for data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB'94). Santiago, Chile: Morgan Kaufmann, 1994. 144~155.
  • 2Ester, M., Kriegal, H.P, Sander, J. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, Evangelos, Han, Jia-wei, Fayyad, U.M., eds. KDD'96--Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
  • 3Kaufman, L., Rousseeuw, P. J. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
  • 4Sibson, R. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 1973,16(1):20~34.
  • 5Bouguettaya, A. On-Line clustering. IEEE Transactions on Knowledge and Data Engineering. 1996,8(2):333~339.
  • 6Voorhees, E.M. Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management, 1986,22:465~476.

共引文献28

同被引文献112

引证文献16

二级引证文献90

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部