An excellent cardinality estimation can make the query optimiser produce a good execution plan.Although there are some studies on cardinality estimation,the prediction results of existing cardinality estimators are in...An excellent cardinality estimation can make the query optimiser produce a good execution plan.Although there are some studies on cardinality estimation,the prediction results of existing cardinality estimators are inaccurate and the query efficiency cannot be guaranteed as well.In particular,they are difficult to accurately obtain the complex relationships between multiple tables in complex database systems.When dealing with complex queries,the existing cardinality estimators cannot achieve good results.In this study,a novel cardinality estimator is proposed.It uses the core techniques with the BiLSTM network structure and adds the attention mechanism.First,the columns involved in the query statements in the training set are sampled and compressed into bitmaps.Then,the Word2vec model is used to embed the word vectors about the query statements.Finally,the BiLSTM network and attention mechanism are employed to deal with word vectors.The proposed model takes into consideration not only the correlation between tables but also the processing of complex predicates.Extensive experiments and the evaluation of BiLSTM-Attention Cardinality Estimator(BACE)on the IMDB datasets are conducted.The results show that the deep learning model can significantly improve the quality of cardinality estimation,which is a vital role in query optimisation for complex databases.展开更多
Host cardinality estimation is an important research field in network management and network security.The host cardinality estimation algorithm based on the linear estimator array is a common method.Existing algorithm...Host cardinality estimation is an important research field in network management and network security.The host cardinality estimation algorithm based on the linear estimator array is a common method.Existing algorithms do not take memory footprint into account when selecting the number of estimators used by each host.This paper analyzes the relationship between memory occupancy and estimation accuracy and compares the effects of different parameters on algorithm accuracy.The cardinality estimating algorithm is a kind of random algorithm,and there is a deviation between the estimated results and the actual cardinalities.The deviation is affected by some systematical factors,such as the random parameters inherent in linear estimator and the random functions used to map a host to different linear estimators.These random factors cannot be reduced by merging multiple estimators,and existing algorithms cannot remove the deviation caused by such factors.In this paper,we regard the estimation deviation as a random variable and proposed a sampling method,recorded as the linear estimator array step sampling algorithm(L2S),to reduce the influence of the random deviation.L2S improves the accuracy of the estimated cardinalities by evaluating and remove the expected value of random deviation.The cardinality estimation algorithm based on the estimator array is a computationally intensive algorithm,which takes a lot of time when processing high-speed network data in a serial environment.To solve this problem,a method is proposed to port the cardinality estimating algorithm based on the estimator array to the Graphics Processing Unit(GPU).Experiments on real-world high-speed network traffic show that L2S can reduce the absolute bias by more than 22%on average,and the extra time is less than 61 milliseconds on average.展开更多
Although the popular database systems perform well on query optimization,they still face poor query execution plans when the join operations across multiple tables are complex.Bad execution planning usually results in...Although the popular database systems perform well on query optimization,they still face poor query execution plans when the join operations across multiple tables are complex.Bad execution planning usually results in bad cardinality estimations.The cardinality estimation models in traditional databases cannot provide high-quality estimation,because they are not capable of capturing the correlation between multiple tables in an effective fashion.Recently,the state-of-the-art learning-based cardinality estimation is estimated to work better than the traditional empirical methods.Basically,they used deep neural networks to compute the relationships and correlations of tables.In this paper,we propose a vertical scanning convolutional neural network(abbreviated as VSCNN)to capture the relationships between words in the word vector in order to generate a feature map.The proposed learning-based cardinality estimator converts Structured Query Language(SQL)queries from a sentence to a word vector and we encode table names in the one-hot encoding method and the samples into bitmaps,separately,and then merge them to obtain enough semantic information from data samples.In particular,the feature map obtained by VSCNN contains semantic information including tables,joins,and predicates about SQL queries.Importantly,in order to improve the accuracy of cardinality estimation,we propose the negative sampling method for training the word vector by gradient descent from the base table and compress it into a bitmap.Extensive experiments are conducted and the results show that the estimation quality of q-error of the proposed vertical scanning convolutional neural network based model is reduced by at least 14.6%when compared with the estimators in traditional databases.展开更多
基金supported by the National Natural Science Foundation of China under grant nos.61772091,61802035,61962006,61962038,U1802271,U2001212,and 62072311the Sichuan Science and Technology Program under grant nos.2021JDJQ0021 and 22ZDYF2680+7 种基金the CCF‐Huawei Database System Innovation Research Plan under grant no.CCF‐HuaweiDBIR2020004ADigital Media Art,Key Laboratory of Sichuan Province,Sichuan Conservatory of Music,Chengdu,China under grant no.21DMAKL02the Chengdu Major Science and Technology Innovation Project under grant no.2021‐YF08‐00156‐GXthe Chengdu Technology Innovation and Research and Development Project under grant no.2021‐YF05‐00491‐SNthe Natural Science Foundation of Guangxi under grant no.2018GXNSFDA138005the Guangdong Basic and Applied Basic Research Foundation under grant no.2020B1515120028the Science and Technology Innovation Seedling Project of Sichuan Province under grant no 2021006the College Student Innovation and Entrepreneurship Training Program of Chengdu University of Information Technology under grant nos.202110621179 and 202110621186.
文摘An excellent cardinality estimation can make the query optimiser produce a good execution plan.Although there are some studies on cardinality estimation,the prediction results of existing cardinality estimators are inaccurate and the query efficiency cannot be guaranteed as well.In particular,they are difficult to accurately obtain the complex relationships between multiple tables in complex database systems.When dealing with complex queries,the existing cardinality estimators cannot achieve good results.In this study,a novel cardinality estimator is proposed.It uses the core techniques with the BiLSTM network structure and adds the attention mechanism.First,the columns involved in the query statements in the training set are sampled and compressed into bitmaps.Then,the Word2vec model is used to embed the word vectors about the query statements.Finally,the BiLSTM network and attention mechanism are employed to deal with word vectors.The proposed model takes into consideration not only the correlation between tables but also the processing of complex predicates.Extensive experiments and the evaluation of BiLSTM-Attention Cardinality Estimator(BACE)on the IMDB datasets are conducted.The results show that the deep learning model can significantly improve the quality of cardinality estimation,which is a vital role in query optimisation for complex databases.
文摘Host cardinality estimation is an important research field in network management and network security.The host cardinality estimation algorithm based on the linear estimator array is a common method.Existing algorithms do not take memory footprint into account when selecting the number of estimators used by each host.This paper analyzes the relationship between memory occupancy and estimation accuracy and compares the effects of different parameters on algorithm accuracy.The cardinality estimating algorithm is a kind of random algorithm,and there is a deviation between the estimated results and the actual cardinalities.The deviation is affected by some systematical factors,such as the random parameters inherent in linear estimator and the random functions used to map a host to different linear estimators.These random factors cannot be reduced by merging multiple estimators,and existing algorithms cannot remove the deviation caused by such factors.In this paper,we regard the estimation deviation as a random variable and proposed a sampling method,recorded as the linear estimator array step sampling algorithm(L2S),to reduce the influence of the random deviation.L2S improves the accuracy of the estimated cardinalities by evaluating and remove the expected value of random deviation.The cardinality estimation algorithm based on the estimator array is a computationally intensive algorithm,which takes a lot of time when processing high-speed network data in a serial environment.To solve this problem,a method is proposed to port the cardinality estimating algorithm based on the estimator array to the Graphics Processing Unit(GPU).Experiments on real-world high-speed network traffic show that L2S can reduce the absolute bias by more than 22%on average,and the extra time is less than 61 milliseconds on average.
基金the CCF-Huawei Database System Innovation Research Plan under Grant No.CCF-HuaweiDBIR2020004Athe National Natural Science Foundation of China under Grant Nos.61772091,61802035,61962006 and 61962038+1 种基金the Sichuan Science and Technology Program under Grant Nos.2021JDJQ0021 and 2020YJ0481the Digital Media Art,Key Laboratory of Sichuan Province,Sichuan Conservatory of Music,Chengdu,China under Grant No.21DMAKL02.
文摘Although the popular database systems perform well on query optimization,they still face poor query execution plans when the join operations across multiple tables are complex.Bad execution planning usually results in bad cardinality estimations.The cardinality estimation models in traditional databases cannot provide high-quality estimation,because they are not capable of capturing the correlation between multiple tables in an effective fashion.Recently,the state-of-the-art learning-based cardinality estimation is estimated to work better than the traditional empirical methods.Basically,they used deep neural networks to compute the relationships and correlations of tables.In this paper,we propose a vertical scanning convolutional neural network(abbreviated as VSCNN)to capture the relationships between words in the word vector in order to generate a feature map.The proposed learning-based cardinality estimator converts Structured Query Language(SQL)queries from a sentence to a word vector and we encode table names in the one-hot encoding method and the samples into bitmaps,separately,and then merge them to obtain enough semantic information from data samples.In particular,the feature map obtained by VSCNN contains semantic information including tables,joins,and predicates about SQL queries.Importantly,in order to improve the accuracy of cardinality estimation,we propose the negative sampling method for training the word vector by gradient descent from the base table and compress it into a bitmap.Extensive experiments are conducted and the results show that the estimation quality of q-error of the proposed vertical scanning convolutional neural network based model is reduced by at least 14.6%when compared with the estimators in traditional databases.