期刊文献+

基于最大频繁Induced子树的GML文档结构聚类 被引量:2

Clustering GML Documents by Structure Based on Maximal Frequent Induced Subtrees
在线阅读 下载PDF
导出
摘要 提出了一种基于最大频繁Induced子树的GML文档结构聚类新算法TBCClustering.通过挖掘GML文档集合中的最大频繁Induced子树构造特征空间,并对特征空间进行优化;采用CLOPE聚类算法聚类GML文档,可自动生成最小支持度与聚类簇的个数,无需用户设置;不仅减少了特征的维数,而且得到了较高的聚类精度.实验结果表明算法TBCClustering是有效的,且性能优于PBClustering算法. This paper presents an algorithm TBCClustering for clustering GML document structure based on maximal fre- quent subtree patterns. During the maximal frequent subtree mining process, it optimizes characteristic spaces, gets the minimum support automatically, chooses some subtree pattern to form the optimistic clustering features, and uses CLOPE algorithm to cluster documents by clustering features without giving the number of cluster. Not only the dimensions of features are reduced, but also the higher clustering precision is obtained. Experiment results show that TBCClustering is more effective and efficient than PBClustering.
出处 《南京师范大学学报(工程技术版)》 CAS 2008年第4期50-55,共6页 Journal of Nanjing Normal University(Engineering and Technology Edition)
基金 国家自然科学基金(40771163)资助项目
关键词 GML结构聚类 最大频繁Induced子树 闭合频繁Induced子树 GML clustering by structure, maximal frequent subtrees, closed frequent subtrees
  • 相关文献

参考文献15

  • 1[1]Guillaume D,Murtagh F.Clustering of XML documents[J].Computer Physics Communications,2000,127 (2/3):215-227.
  • 2[2]Doucet A,Ahonen-Myka H.Na(i)ve Clustering of a Large XML Document Collection[C]// Proc 1 st Annual Workshop of the Initiative for the Evaluation of XML retrieval(INEX).Germany:ACM Press,2002:81-88.
  • 3[3]Nierman A,Jagadish H V.Evaluating structural similarity in XML documents[C]// Proceedings of the 5th International Workshop on the Web and Databasc(WebDB).Madison,2002:61-66.
  • 4[4]Zhang K,Shasha D.Simple fast algorithms for the editing distance between trees and related problems[J].SIAM Journal on Computing,1989,18(6):1 245-1 262.
  • 5[5]Wang L,Cheung D W,Mamoulis N,et al.An Efficient and Scalable Algorithm for Clustering XML Documents by Structure[J].IEEE TKDE,2004,16(1):82-96.
  • 6[6]Leung H P,Chung F L,Chan S C F.On the use of hierarchical information in sequential mining-based XML document similarity computation[J].Knowledge and Information Systems,2005,7(4):476 -498.
  • 7[7]Leung H P,Chung F L,Chan S C F,et al.XML document clustering using common Xpath[C]// 2005 International Workshop on Challenges in Web Information Retrieval and Integration.Tokyo:IEEE Computer Society Press,2005:91-96.
  • 8[8]Nayak R,Xu S.XCLS:a fast and effective clustering algorithm for heterogcnous XML documents[C]// Proceeding of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining.Singapore:ACM Press,2006.
  • 9[9]Chehredhani M H,Rahgozar M,Lucas C,et al.Clustering rooted ordered trees[C]// Computational Intelligence and Data Mining.Honolulu,Hawaii:IEEE Press,2007:450-455.
  • 10[10]Francesca F D,Gordano G,Ortale R,et al.A general framework for XML document clustering[R].Consiglio Nazionale delle Ricerche lstituto di Calcolo e Reti ad Alte Prestazioni,2003.

同被引文献7

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部