期刊文献+

Efficient Mining of Frequent Closed XML Query Pattern

Efficient Mining of Frequent Closed XML Query Pattern
原文传递
导出
摘要 Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries' size. Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries' size.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第5期725-735,共11页 计算机科学技术学报(英文版)
基金 This work is supported in part by the National Natural Science Foundation of China under Grant No.60573094 the National Grand Fundamental Research 973 Program of China under Grant No.2006CB303103 the National High Technology Development 863 Program of China under Grant No.2006AA01A101 Tsinghua Basic Research Foundation under Grant No.JCqn2005022.
关键词 computer software frequent closed pattern data mining XML XPATH computer software, frequent closed pattern, data mining, XML, XPath
  • 相关文献

参考文献28

  • 1Chen Q, Lim A et al. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. the ACM SIGMOD Int. Conf. Management of Data, San Diego, CA, USA, Jun. 9-12, 2003, pp.134-144.
  • 2Kaushik R, Shenoy Pet al. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. the 18th Int. Conf. Data Engineering, San Jose, CA, USA, Feb. 26-Mar. 1, 2002, pp.129-140.
  • 3Milo T, Suciu D. Index structures for path expressions. In Proc. the 7th Int. Conf. Database Theory, Jerusalem, Israel, Jan. 10-12, 1999, pp.277-295.
  • 4Yang L H, Lee M Let al. Efficient mining of XML query patterns for caching. In Proc. the 29th Int. Conf. Very Large Data Bases, Berlin, Germany, Sept. 9-12, 2003, pp.69-80.
  • 5Yan X, Han Jet al. Mining closed sequential patterns in large databases. In Proc. the 3rd SIAM Int. Conf. Data Mining, San Francisco, CA, USA, May 1-3, 2003, Electronic Edition.
  • 6Dehaspe L, Toivonen H et al. Finding frequent substructures in chemical compounds. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, New York, USA, Aug. 27-31, 1998, pp.30-36.
  • 7Bettini G, Wang X et al. Mining temporal relationals with multiple granularities in time sequences. IEEE Data Engineering Bulletin, 1998, 21(1): 32-38.
  • 8Pei J, Han Jet al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 18th Int. Conf. Data Engineering, Heidelberg, Germany, April 2-6, 2001, pp.215-224.
  • 9Feng J, Qian Q et al. Exploit sequencing to accelerate hot XML query pattern mining. In Proc. the 2006 ACM Syrup. Applied Computing, Dijon, France, Apr. 23-27, 2006, pp.517-524.
  • 10Qian Q, Feng Jet al. Exploit sequencing to accelerate XML twig query answering. In Proc. the 11th Int. Conf. Database Systems for Advanced Applications, Singapore, Apr. 12-15, 2006, pp.279-294.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部