Efficient Mining of Frequent Closed XML Query Pattern

Efficient Mining of Frequent Closed XML Query Pattern

导出

摘要 Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA＊, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA＊ deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA＊ over the previous known alternative. SOLARIA＊ is also linearly scalable in terms of XML queries＇ size. Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA＊, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA＊ deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA＊ over the previous known alternative. SOLARIA＊ is also linearly scalable in terms of XML queries＇ size.

作者冯建华钱乾王建勇周立柱

机构地区 Department of Computer Science and Technology Tsinghua University

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第5期725-735,共11页 计算机科学技术学报（英文版）

基金 This work is supported in part by the National Natural Science Foundation of China under Grant No.60573094 the National Grand Fundamental Research 973 Program of China under Grant No.2006CB303103 the National High Technology Development 863 Program of China under Grant No.2006AA01A101 Tsinghua Basic Research Foundation under Grant No.JCqn2005022.

关键词 computer software frequent closed pattern data mining XML XPATH computer software, frequent closed pattern, data mining, XML, XPath

分类号 TP312.2 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献28

1Chen Q, Lim A et al. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. the ACM SIGMOD Int. Conf. Management of Data, San Diego, CA, USA, Jun. 9-12, 2003, pp.134-144.
2Kaushik R, Shenoy Pet al. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. the 18th Int. Conf. Data Engineering, San Jose, CA, USA, Feb. 26-Mar. 1, 2002, pp.129-140.
3Milo T, Suciu D. Index structures for path expressions. In Proc. the 7th Int. Conf. Database Theory, Jerusalem, Israel, Jan. 10-12, 1999, pp.277-295.
4Yang L H, Lee M Let al. Efficient mining of XML query patterns for caching. In Proc. the 29th Int. Conf. Very Large Data Bases, Berlin, Germany, Sept. 9-12, 2003, pp.69-80.
5Yan X, Han Jet al. Mining closed sequential patterns in large databases. In Proc. the 3rd SIAM Int. Conf. Data Mining, San Francisco, CA, USA, May 1-3, 2003, Electronic Edition.
6Dehaspe L, Toivonen H et al. Finding frequent substructures in chemical compounds. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, New York, USA, Aug. 27-31, 1998, pp.30-36.
7Bettini G, Wang X et al. Mining temporal relationals with multiple granularities in time sequences. IEEE Data Engineering Bulletin, 1998, 21(1): 32-38.
8Pei J, Han Jet al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 18th Int. Conf. Data Engineering, Heidelberg, Germany, April 2-6, 2001, pp.215-224.
9Feng J, Qian Q et al. Exploit sequencing to accelerate hot XML query pattern mining. In Proc. the 2006 ACM Syrup. Applied Computing, Dijon, France, Apr. 23-27, 2006, pp.517-524.
10Qian Q, Feng Jet al. Exploit sequencing to accelerate XML twig query answering. In Proc. the 11th Int. Conf. Database Systems for Advanced Applications, Singapore, Apr. 12-15, 2006, pp.279-294.

1Yi-jun BEI,Gang CHEN,Jin-xiang DONG,Ke CHEN.Bottom-up mining of XML query patterns to improve XML querying[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2008,9(6):744-757.
2王燕,谢金宝.Linux下基于Web的目录服务系统的设计与实现[J].计算机工程,2000,26(12):134-136. 被引量：7
3路燕,张亮,汪卫,段起阳,施伯乐.XML查询中DTD的排序技术[J].计算机研究与发展,2003,40(11):1579-1585. 被引量：6
4Jinhyung KIM,Dongwon JEONG,Doo-Kwon BAIK.VQT:value cardinality and query pattern based R-schema to XML schema translation with implicit referential integrity[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2008,9(12):1694-1707. 被引量：1
5苏斐,慈林林,朱丽萍,赵欣欣.Data Caching for XML Query[J].Journal of Beijing Institute of Technology,2006,15(3):296-299.
6李庆忠,王海洋,闫中敏,马绍汉.Efficient Mining of Association Rules by Reducingthe Number of Passes over the Database[J].Journal of Computer Science & Technology,2001,16(2):182-188. 被引量：2
7裴亚杰.OFFICE预警（三）挽救损坏的Word文档[J].计算机应用文摘,2006(9):30-30.
8杨瑾,许冬.10／100Mbps以太网接口卡设计与开发[J].计算机技术,1997(12):35-37.
9汤晓松,胡桂武,张欣.浅谈面向主题的搜索引擎[J].保险职业学院学报,2005(2):45-48.
10吴训威.数字电路(二值-多值)的新理论[J].科技通报,1992,8(1):1-6.

Journal of Computer Science & Technology

2007年第5期

浏览历史

内容加载中请稍等...

Efficient Mining of Frequent Closed XML Query Pattern

参考文献28

相关作者

相关机构

相关主题

浏览历史