摘要
随着图模型规模的扩大,单机算法难以适应大规模数据集下的子图查询.而现有的分布式算法基于无索引的简单遍历,join过程容易出现内存溢出,而且查询图分布异常时易出现负载不均衡.提出了一种基于谱编码的二叉索引树(SCBT-index),首先对数据图中的顶点谱编码,根据编码信息构建二叉索引树.然后对查询图使用最小查询计划进行分解,最后join过程使用3个剪枝策略:基于拓扑结构的预剪枝、序列化join和基于分布式下的join优化.实验结果表明,SCBT-index在图集下的综合性能优于现有主流算法,单图下的查询时间为现有算法的1/2到1/4.
With the expansion of graph scale,single-machine algorithms can hardly adapt to the sub-graph queries in large-scale data sets.As existing distributed algorithms are based on simple traversal without index,the join process is prone to memory overflow in the distributed algorithms and load imbalance occurs when the query graph distribution is abnormal.Therefore,a binary index tree based on spectral coding named SCBT-index is proposed.Firstly,for vertex spectrum coding in the data graph,a binary index tree is constructed according to the coding information.Then,the query graph is decomposed using the minimum query plan.Finally,three pruning strategies are used in the join process:structure matching based on topological structure,serialized join and the distributed join optimization.The experimental results show the comprehensive performance of SCBT-index under the graph set is better than that of the popular algorithms.In addition,the query time under the single graph is 1/2 to 1/4 of that of the existing algorithms.
作者
施炜杰
董一鸿
钱江波
陈华辉
辛宇
SHI Wei-jie;DONG Yi-hong;QIAN Jiang-bo;CHEN Hua-hui;XIN Yu(Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo,Zhejiang 315211,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2020年第1期110-117,共8页
Acta Electronica Sinica
基金
国家自然科学基金(No.61572266,No.61602133)
浙江省自然科学基金(No.LY20F020009,No.LZ20F020001)