期刊文献+

基于模式元素的文档聚类方法研究

A Research on Clustering Method Based on Element of XML Schema
在线阅读 下载PDF
导出
摘要 聚类问题的关键是把相似的事物聚集在一起,因此相似度计算是进行文档聚类的首要问题.XML模式是XML文档结构的体现,对XML文档的聚类可以通过XML模式的聚类来实现.本文提出一种基于XML模式元素的文档聚类方法,通过计算XML模式元素间的相似度来对文档进行聚类,综合考虑了XML模式中元素的结构和语义信息,进一步提高了计算相似度的精度,提高聚类的准确性,并且易于提取聚簇的通用XML模式. A clustering method based on element of XML schema is brought forward in this paper. The key of clustering is to aggregate the similar things together. Therefore, the similarity is the important foundation for XML clustering. Schema is the representation of document structure, and clustering of XML documents can be achieved through clustering of XML schemas. The'authors of this paper cluster documents by calculating the sim ilarity of elements, because elements are the main body in XML. The approach takes full account of the struc ture and semantics of elements, and makes a more accurate calculation of similarity. In the meanwhile, it im- proves the accuracy of clustering and makes it easy to extract the common XML schema.
作者 孙霞 张玉生
出处 《常熟理工学院学报》 2012年第8期94-98,共5页 Journal of Changshu Institute of Technology
关键词 元素 模式 相似度 聚类 element schema similarity clustering
  • 相关文献

参考文献6

  • 1Chang C H, Lui S C, Wu Y C. Applying pattern mining to Web information extraction[A]. In Proceedings of the Fifth Pacific Asia Conference on Knowledge Discovery and Data Mining [C]. Hong Kong, 2001:3.
  • 2Min J K, Ahn J Y, Chung C W. Efficient Extraction of Schemas for XML Documents[J]. Information Processing Letters, 2003, 85(1): 7.
  • 3张海威,袁晓洁,杨娜,王鑫.元素路径模型:高效的XML Schema提取方法[J].计算机工程,2008,34(3):32-34. 被引量:2
  • 4Hegewald J, Naumann F, Weis M. XStruct: Efficient Schema Extraction from Multiple and Large XML Documents[C]. Proceedings of the 22nd International Conference on Data Engineering Workshops. Atlanta, GA, USA: [s.n. ], 2006: 81.
  • 5George M, richard B. Introduction to wordNet:an online lexical database[J]. International Journal of Lexicography, 1993, 3(4): 235-312.
  • 6杨厚群,何中市,雷景生.基于划分的XML文档聚类研究[J].计算机科学,2008,35(3):183-185. 被引量:4

二级参考文献19

  • 1潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量:16
  • 2Garofalakis M, Gionis A, Rastogi R, et al. XTRACT: A System for Extracting Document Type Descriptors from XML Documents[C]// Proceedings ofACM SIGMOD. Dallas, Texas: [s. n.], 2000: 165.
  • 3Berman L, Diaz A. Data Descriptors by Example[EB/OL]. (2001 - 10-10). http://www.alphaworks.ibm.com/tech/DDbE.
  • 4Moh C H, Lim E P, Ng W K. DTD-miner: A Tool for Mining DTD from XML Documents[C]//Proceedings of International Workshop on Advance Issues of E-commerce and Web-based Information Systems. San Jose : [s. n.], 2000: 144.
  • 5Hegewald J, Naumann F, Weis M. XStruct: Efficient Schema Extraction from Multiple and Large XML Documents[C]// Proceedings of the 22rid International Conference on Data Engineering Workshops. Atlanta, GA, USA: [s. n.], 2006:81.
  • 6Min J K, Ahn J Y, Chung C W. Efficient Extraction of Schemas for XML Documents[J]. Information Processing Letters, 2003, 85( 1): 7.
  • 7Leung H P, Chung F L,Chan S C F. On the use of hierarchical information in sequential mining-based XML document similarity computation. Knowledge and Information Systems,2005,7(4)
  • 8Kaufman L,Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. New York:Wiley, 1990
  • 9Lee M L, et al. XClust: Clustering XML schemas for Effective Integration. In:Proc. 11th Int Conf on Information & Knowledge Management, McLean, Nov. 2002. 292-299
  • 10Sigmod XML DataSet. Available at : http ://www. acm.org/ sigmod/ record/xml, 2005 - 7

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部