期刊文献+

GML文档结构聚类算法Clu-GML 被引量:8

Clu-GML:An algorithm for clustering geography markup language documents by structure
在线阅读 下载PDF
导出
摘要 提出了一种geography markup language(GML)文档结构聚类新算法Clu-GML,与其它相关算法不同,该算法在凝聚的层次聚类中引入代表树的计算,通过计算最大频繁Induced子树得到簇的代表树,通过对代表树的比较发现新的簇,并更新新簇的代表树来完成聚类,不仅减少了聚类的时间开销,而且为每个簇形成聚类描述.实验结果表明算法Clu-GML是有效的,且性能优于其它同类算法. Algorithm Clu-GML for clustering geography markup language (GML) documents by structure is proposed in this paper. Compared with other tree-based clustering algorithms, it introduces the computation of representative trees during the agglomerative hierarchical clustering process. The representative trees can be gotten through the computation of the maximal frequent induced subtrees. The new clusters are gotten by the comparison of representative trees, and the representative trees of new clusters are updated to finish the clustering. In all the papers that have researched this issue the similarity or distance between every two documents needs to be computed. It costs a lot of time. When the dataset is large, the time performance doesn't satisfy us at all. While the algorithm that this paper has presented just needs to compute the similarity between two representative trees for the two clusters. It's fast and scalable when the dataset is very large. It not only reduces the running time of the algorithm, but also creates a description for every cluster. The experiment results show that Clu-GML is effective, and the performance is superior to that of other GML clustering algorithms.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第2期188-194,共7页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(40771163)
关键词 GEOGRAPHY MARKUP LANGUAGE 结构聚类 最大频繁induced子树 geography markup language(GML), clustering by structure, the maximal frequent induced subtree
  • 相关文献

参考文献12

  • 1Yun C, Yi X, Yang Y R, et al. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 (2): 190-202.
  • 2Nierman A, Jagadish H V. Evaluating structural similarity in xml documents. Proceedings of the WebDB Workshop, USA: Madison, 2002 : 61-66.
  • 3Chawathe S S. Comparing hierarchical data in external memory. Proceedings of the VLDB Conference, UK: Edinburgh, 1999: 90-101.
  • 4Wang L,Cheung D W, Mamoulis N, et al. An efficient and scalable algorithm for clustering XML documents by structure. IEEE Transactions on Knowledge and Data Engineering, 2004,16(1) :82-96.
  • 5Francesca F D, Gordano G, Ortale R, et al. A general framework for XML document clustering. Technical Report, No. 8, ICAR-CNR (Consiglio Nazionale delle Ricerche Istituto di Calcoloe Reti ad Alte Prestazioni), 2003.
  • 6潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量:16
  • 7陆翠明,李芳,Athena I Vakali.XML文档相似性的仿真研究[J].计算机仿真,2005,22(12):300-302. 被引量:1
  • 8Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. Proceedings of ICDE99 (International Conference on Data Engineering), Australia: Sydney, 1999, 512-521.
  • 9Theodore D, Tao C, Klaas J W, et al. Clustering XML documents using structural summaries. Current Trends in Database Technology- EDBT 2004 Workshops. Springer Berlin/Heidelberg, 2004 : 547-556.
  • 10Leung H P, Chung FL, Stephen C F C. On the use of hierarchical information in sequential mining-based XML document similarity computation. Knowledge and Information Systems, 2005, 7(4) :476-498.

二级参考文献35

  • 1潘有能,邓三鸿.基于XML和关联规则的Web挖掘研究[J].现代图书情报技术,2004(7):30-34. 被引量:9
  • 2Sharkey A J C. On combining artificial neural networks. Connection Science, 1996, 8 : 299 -313.
  • 3Krogh A, Vedelsby J. Neural networks ensembles,cross validation, and active learning. Tesauro G,Touretzky D, Lee T. Advances in Neural Information Processing Systems. Cambridge: Massachusetts Institute of Technology Press, 1995 ( 8 ) :231 -238.
  • 4Freund Y, Schapire R. Experimants with a new boosting algorithm. Proceedings of the Thirteenth International Conference on Machine Learning. Italy: Bari, 1996, 148 - 156.
  • 5Breiman L. Bagging predictors. Machine Learning,1996, 24(2) :123 - 140.
  • 6Liu Y, Yao X. Ensemble learning via negative correlation. Neural Networks, 1999, 12 ( 10 ) :1 399-1 404.
  • 7Benediktsson J A, Sveinsson J R, Ersoy O K. Optimized combination of neural networks. Proceedings of the IEEE International Symposium on Circuits and Systems, 1996,3 : 535 - 538.
  • 8Jimenez D. Dynamically weighted ensemble neural network for classification. Proceedings of the IEEE International Joint Conference on Neural Networks,1998,1(4-9) : 753 -756.
  • 9Zhou Z H, Wu J X, Tang W. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002,17 ( 1 - 2) : 239 - 263.
  • 10Giacinto G, Roli F. Design of effective neural network ensembles for image classification purposes.Image and Vision Computing, 2001,19:699 -707.

共引文献20

同被引文献78

引证文献8

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部