大型语言模型(LLM)已成为推进Text-to-SQL任务的强大工具。研究发现,基于LLM的模型在不同评估指标下,其性能表现与经过微调的模型存在显著差异。因此,文章分析了测试套件执行准确度(EXE)和精确集匹配准确度(ESM)在评估基于LLM的Text-to-...大型语言模型(LLM)已成为推进Text-to-SQL任务的强大工具。研究发现,基于LLM的模型在不同评估指标下,其性能表现与经过微调的模型存在显著差异。因此,文章分析了测试套件执行准确度(EXE)和精确集匹配准确度(ESM)在评估基于LLM的Text-to-SQL模型时的不足,并提出了改进指标EESM(Enhanced Exact Set Matching)。实验结果表明,EXE和ESM分别存在高达13.2%和10.8%的假阳性和假阴性率,而EESM的假阳性率和假阴性率分别仅为0.2%和1.8%,表明EESM能够提供更准确的评估。展开更多
Text-to-SQL is the task of translating a natural language query into a structured query language. Existing text-to-SQL approaches focus on improving the model’s architecture while ignoring the relationship between qu...Text-to-SQL is the task of translating a natural language query into a structured query language. Existing text-to-SQL approaches focus on improving the model’s architecture while ignoring the relationship between queries and table schemas and the differences in difficulty between examples in the dataset. To tackle these challenges, a two-stage curriculum learning framework for text-to-SQL(TSCL-SQL) is proposed in this paper. To exploit the relationship between the queries and the table schemas, a schema identification pre-training task is proposed to make the model choose the correct table schema from a set of candidates for a specific query. To leverage the differences in difficulty between examples, curriculum learning is applied to the text-to-SQL task, accompanied by an automatic curriculum learning solution, including a difficulty scorer and a training scheduler. Experiments show that the framework proposed in this paper is effective.展开更多
This study presents a comparative analysis of a complex SQL benchmark, TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings reveal that TPC-DS queries exhibit a significantly higher level of...This study presents a comparative analysis of a complex SQL benchmark, TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings reveal that TPC-DS queries exhibit a significantly higher level of structural complexity compared to the other two benchmarks. This underscores the need for more intricate benchmarks to simulate realistic scenarios effectively. To facilitate this comparison, we devised several measures of structural complexity and applied them across all three benchmarks. The results of this study can guide future research in the development of more sophisticated text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based on the query descriptions provided by the TPC-DS benchmark. The prompt engineering process incorporated both the query description as outlined in the TPC-DS specification and the database schema of TPC-DS. Our findings indicate that the current state-of-the-art generative AI models fall short in generating accurate decision-making queries. We conducted a comparison of the generated queries with the TPC-DS gold standard queries using a series of fuzzy structure matching techniques based on query features. The results demonstrated that the accuracy of the generated queries is insufficient for practical real-world application.展开更多
Text-to-SQL aims at translating textual questions into the corresponding SQL queries.Aggregate tables are widely created for high-frequent queries.Although text-to-SQL has emerged as an important task,recent studies p...Text-to-SQL aims at translating textual questions into the corresponding SQL queries.Aggregate tables are widely created for high-frequent queries.Although text-to-SQL has emerged as an important task,recent studies paid little attention to the task over aggregate tables.The increased aggregate tables bring two challenges:(1)mapping of natural language questions and relational databases will suffer from more ambiguity,(2)modern models usually adopt self-attention mechanism to encode database schema and question.The mechanism is of quadratic time complexity,which will make inferring more time-consuming as input sequence length grows.In this paper,we introduce a novel approach named WAGG for text-to-SQL over aggregate tables.To effectively select among ambiguous items,we propose a relation selection mechanism for relation computing.To deal with high computation costs,we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables.We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation,where extensive experiments show the effectiveness and efficiency of our proposed method with 4%increase of accuracy and 15%decrease of inference time w.r.t a strong baseline RAT-SQL.展开更多
文摘大型语言模型(LLM)已成为推进Text-to-SQL任务的强大工具。研究发现,基于LLM的模型在不同评估指标下,其性能表现与经过微调的模型存在显著差异。因此,文章分析了测试套件执行准确度(EXE)和精确集匹配准确度(ESM)在评估基于LLM的Text-to-SQL模型时的不足,并提出了改进指标EESM(Enhanced Exact Set Matching)。实验结果表明,EXE和ESM分别存在高达13.2%和10.8%的假阳性和假阴性率,而EESM的假阳性率和假阴性率分别仅为0.2%和1.8%,表明EESM能够提供更准确的评估。
基金Fundamental Research Funds for the Central Universities,China (No. 2232023D-19)。
文摘Text-to-SQL is the task of translating a natural language query into a structured query language. Existing text-to-SQL approaches focus on improving the model’s architecture while ignoring the relationship between queries and table schemas and the differences in difficulty between examples in the dataset. To tackle these challenges, a two-stage curriculum learning framework for text-to-SQL(TSCL-SQL) is proposed in this paper. To exploit the relationship between the queries and the table schemas, a schema identification pre-training task is proposed to make the model choose the correct table schema from a set of candidates for a specific query. To leverage the differences in difficulty between examples, curriculum learning is applied to the text-to-SQL task, accompanied by an automatic curriculum learning solution, including a difficulty scorer and a training scheduler. Experiments show that the framework proposed in this paper is effective.
文摘This study presents a comparative analysis of a complex SQL benchmark, TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings reveal that TPC-DS queries exhibit a significantly higher level of structural complexity compared to the other two benchmarks. This underscores the need for more intricate benchmarks to simulate realistic scenarios effectively. To facilitate this comparison, we devised several measures of structural complexity and applied them across all three benchmarks. The results of this study can guide future research in the development of more sophisticated text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based on the query descriptions provided by the TPC-DS benchmark. The prompt engineering process incorporated both the query description as outlined in the TPC-DS specification and the database schema of TPC-DS. Our findings indicate that the current state-of-the-art generative AI models fall short in generating accurate decision-making queries. We conducted a comparison of the generated queries with the TPC-DS gold standard queries using a series of fuzzy structure matching techniques based on query features. The results demonstrated that the accuracy of the generated queries is insufficient for practical real-world application.
文摘Text-to-SQL aims at translating textual questions into the corresponding SQL queries.Aggregate tables are widely created for high-frequent queries.Although text-to-SQL has emerged as an important task,recent studies paid little attention to the task over aggregate tables.The increased aggregate tables bring two challenges:(1)mapping of natural language questions and relational databases will suffer from more ambiguity,(2)modern models usually adopt self-attention mechanism to encode database schema and question.The mechanism is of quadratic time complexity,which will make inferring more time-consuming as input sequence length grows.In this paper,we introduce a novel approach named WAGG for text-to-SQL over aggregate tables.To effectively select among ambiguous items,we propose a relation selection mechanism for relation computing.To deal with high computation costs,we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables.We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation,where extensive experiments show the effectiveness and efficiency of our proposed method with 4%increase of accuracy and 15%decrease of inference time w.r.t a strong baseline RAT-SQL.