Finding all occurrences of a twig pattern is a core operation of extensible markup language (XML) query processing. Holistic twig join algorithms, which avoid a large number of intermediate results, represent the stat...Finding all occurrences of a twig pattern is a core operation of extensible markup language (XML) query processing. Holistic twig join algorithms, which avoid a large number of intermediate results, represent the state-of-the-art algorithms. However, ordered XML twig join is mentioned rarely in the literature and previous algorithms developed in attempts to solve the problem of ordered twig pattern (OTP) matching have poor performance. In this paper, we first propose a novel children linked stacks encoding scheme to represent compactly the partial ordered twig join results. Based on this encoding scheme and extended Dewey, we design a novel holistic OTP matching algorithm, called OTJFast, which needs only to access the labels of the leaf query nodes. Furthermore, we propose a new algorithm, named OTJFaster, incorporating three effective optimization rules to avoid unnecessary computations. This works well on available indices (such as B+-tree), skipping useless elements. Thus, not only is disk access reduced greatly, but also many unnecessary computations are avoided. Finally, our extensive experiments over both real and synthetic datasets indicate that our algorithms are superior to previous approaches.展开更多
Artificial intelligence-enabled database technology,known as AI4DB(Artificial Intelligence for Databases),is an active research area attracting significant attention and innovation.This survey first introduces the bac...Artificial intelligence-enabled database technology,known as AI4DB(Artificial Intelligence for Databases),is an active research area attracting significant attention and innovation.This survey first introduces the background of learning-based database techniques.It then reviews advanced query optimization methods for learning databases,focusing on four popular directions:cardinality/cost estimation,learningbased join order selection,learning-based end-to-end optimizers,and text-to-SQL models.Cardinality/cost estimation is classified into supervised and unsupervised methods based on learning models,with illustrative examples provided to explain the working mechanisms.Detailed descriptions of various query optimizers are also given to elucidate the working mechanisms of each component in learning query optimizers.Additionally,we discuss the challenges and development opportunities of learning query optimizers.The survey further explores text-to-SQL models,a new research area within AI4DB.Finally,we consider the future development prospects of learning databases.展开更多
基金Project supported by the National Natural Science Foundation of China (Nos 60603044 and 60803003)the Program for the Changjiang Scholars and Innovative Research Team in University (No IRT0652)the Key Technology Projects of Zhejiang Province, China (No. 2006c11108)
文摘Finding all occurrences of a twig pattern is a core operation of extensible markup language (XML) query processing. Holistic twig join algorithms, which avoid a large number of intermediate results, represent the state-of-the-art algorithms. However, ordered XML twig join is mentioned rarely in the literature and previous algorithms developed in attempts to solve the problem of ordered twig pattern (OTP) matching have poor performance. In this paper, we first propose a novel children linked stacks encoding scheme to represent compactly the partial ordered twig join results. Based on this encoding scheme and extended Dewey, we design a novel holistic OTP matching algorithm, called OTJFast, which needs only to access the labels of the leaf query nodes. Furthermore, we propose a new algorithm, named OTJFaster, incorporating three effective optimization rules to avoid unnecessary computations. This works well on available indices (such as B+-tree), skipping useless elements. Thus, not only is disk access reduced greatly, but also many unnecessary computations are avoided. Finally, our extensive experiments over both real and synthetic datasets indicate that our algorithms are superior to previous approaches.
基金partially supported by the National Natural Science Foundation of China(Grant No.62272066)Open Research Fund of Guangxi Key Lab of Human-machine Interaction and Intelligent Decision(GXHIID2207)+5 种基金Sichuan Science and Technology Program(2025ZNSFSC0044,2025YFHZ0194)Chengdu Technological Innovation Research and Development Project(2024-YF05-01217-SN)Chengdu Regional Science and Technology Innovation Cooperation Project(2025-YF11-00050-HZ)Open Foundation of Key Laboratory of Cyberspace Security,Ministry of Education of China and Henan Key Laboratory of Cyberspace Situation Awareness(KLCS20240106)Ant Group through CCFAnt Research Fund(CCF-AFSG RF20240106)Open Research Fund of Key Laboratory of Cyberspace Big Data Intelligent Security(Chongqing University of Posts and Telecommunications),Ministry of Education of China(CBDIS202404).
文摘Artificial intelligence-enabled database technology,known as AI4DB(Artificial Intelligence for Databases),is an active research area attracting significant attention and innovation.This survey first introduces the background of learning-based database techniques.It then reviews advanced query optimization methods for learning databases,focusing on four popular directions:cardinality/cost estimation,learningbased join order selection,learning-based end-to-end optimizers,and text-to-SQL models.Cardinality/cost estimation is classified into supervised and unsupervised methods based on learning models,with illustrative examples provided to explain the working mechanisms.Detailed descriptions of various query optimizers are also given to elucidate the working mechanisms of each component in learning query optimizers.Additionally,we discuss the challenges and development opportunities of learning query optimizers.The survey further explores text-to-SQL models,a new research area within AI4DB.Finally,we consider the future development prospects of learning databases.