摘要
聚类问题的关键是把相似的事物聚集在一起,因此相似度计算是进行文档聚类的首要问题.XML模式是XML文档结构的体现,对XML文档的聚类可以通过XML模式的聚类来实现.本文提出一种基于XML模式元素的文档聚类方法,通过计算XML模式元素间的相似度来对文档进行聚类,综合考虑了XML模式中元素的结构和语义信息,进一步提高了计算相似度的精度,提高聚类的准确性,并且易于提取聚簇的通用XML模式.
A clustering method based on element of XML schema is brought forward in this paper. The key of clustering is to aggregate the similar things together. Therefore, the similarity is the important foundation for XML clustering. Schema is the representation of document structure, and clustering of XML documents can be achieved through clustering of XML schemas. The'authors of this paper cluster documents by calculating the sim ilarity of elements, because elements are the main body in XML. The approach takes full account of the struc ture and semantics of elements, and makes a more accurate calculation of similarity. In the meanwhile, it im- proves the accuracy of clustering and makes it easy to extract the common XML schema.
出处
《常熟理工学院学报》
2012年第8期94-98,共5页
Journal of Changshu Institute of Technology
关键词
元素
模式
相似度
聚类
element
schema
similarity
clustering