摘要
为高效地存储和管理大规模语义Web数据,结合语义Web数据查询的特点,提出一种基于HBase的资源描述框架RDF(Resource Description Framework)数据存储改进方法。该方法将以主语+谓语、谓语+宾语、宾语+主语为索引的RDF数据存放在SP_O、PO_S、OS_P三张索引表中,同时将PO_S表按类划分为P_SO和P_OS两类,并给出改进的查询索引方法。对数据的加载存储,利用HBase自带的BulkLoad工具将数据上传至HBase存储表中。通过理论分析和实验结果显示,改进的存储方法对固定谓语的查询能作出快速响应;BulkLoad并行加载数据具有较高的加速比,在缩短数据加载时间的同时能提升系统整体存储性能。
In order to efficiently store and manage large-scale semantic Web data,an improved method of data storage based on HBase's resource description framework RDF is proposed.,which combines the characteristics of semantic Web data query.In this method,RDF data indexed by subject+predicate,predicate+object,object+subject is stored in three index tables of SP_O,PO_S and OS_P.At the same time,PO_S table is divided into two categories,P_SO and P_OS,and an improved query index method is given.To load and store the data,the BulkLoad tool that HBase brings is used to upload the data to the HBase storage table.The theoretical analysis and experimental results show that the improved storage method can respond quickly to the fixed predicate query;BulkLoad parallel loading data has a high acceleration ratio,which can improve the overall storage performance of the system while shortening the data loading time.
作者
朱道恒
秦学
刘君凤
ZHU Dao-heng;QIN Xue;LIU Jun-feng(College of Big Data and Information Engineering,Guizhou University,Guiyang Guizhou 550025,China)
出处
《软件》
2019年第12期13-17,共5页
Software
基金
贵大人基合字(2014)33号