摘要
针对分布式数据库查询效率随着数据规模的增大而降低的问题,以Greenplum分布式数据库为研究对象,从优化查询路径的角度提出一个基于代价的最优查询计划生成方法。首先,该方法设计一种有效的代价模型来估算查询代价;然后,采用并行最大最小蚁群算法来搜索具有最小查询代价的连接顺序,即最优连接顺序;最后,根据Greenplum数据库对查询计划中不同操作的默认最优选择得到最优查询计划。采用该方法在自主生成的数据集与事务处理性能理事会测试基准(TPC-H)的标准数据集上进行了多组实验。实验结果表明,所提出的优化方法能有效地搜索出最优解,获得最优的查询计划,从而提升Greenplum数据库的查询效率。
In order to solve the problem that the query efficiency of distributed database decreases with the increase of data scale, the Greenplum distributed database was taken as the research object, and a cost-based optimal query plan generation scheme was proposed from the perspective of optimizing the query path. Firstly, an effective cost model was designed to estimate the query cost. The parallel maximum and minimum ant colony algorithm was then used to search the join order with the minimum query cost, i. e. the optimal join order. Finally, the optimal query plan was obtained based on the Greenplum database's default optimal choice for different operations in the query plan. Multiple experiments were carried out on the self-generated data set and Transaction Processing Performance Council Benchmark H( TPC-H) standard data set by using the proposed scheme. The experimental results show that the proposed optimization scheme can effectively search out the optimal solution and obtain the optimal query plan, so as to improve the query efficiency of Greenplum database.
出处
《计算机应用》
CSCD
北大核心
2018年第2期478-482,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61503289)
湖北省科技支撑计划项目(2015BAA120,2015BCE068).