期刊文献+

大数据环境下支持概率数据范围查询索引的研究 被引量:2

Indexing Probabilistic Data for Supporting Range Query Over Big Data
在线阅读 下载PDF
导出
摘要 随着数据规模的不断增长,大数据管理具有重要意义.在众多数学模型中,因为概率模型可以将海量数据抽象成少量概率数据,所以它非常适合管理大数据.因此,研究大数据环境下的概率数据管理具有重要意义.作为一种经典查询,基于概率数据的范围查询已被深入研究.然而,当前研究成果不适合在大数据环境下使用.其根本原因是这些索引的更新代价较大.该文提出了索引HGD-Tree解决这一问题.首先,该文提出了一系列算法降低新增数据的处理代价.它可以保证树结构平衡的前提下快速地执行插入、删除、更新等操作.其次,该文提出了一种基于划分的方法构建概率对象的概要信息.它可以根据概率密度函数的特点自适应地执行划分.此外,由于作者提出的概要是基于比特向量,上述策略可以保证索引以较低空间代价管理概率数据.最后,该文提出了一种基于位运算的方法访问HGD-Tree.它可以用少量的位运算执行过滤操作.大量的实验验证了算法的有效性. With the increasing of data scale, big data management is great significant. Underlying the popular mathematical models, probabilistic model is suitable for big data management since it could compress volume of data into a few probabilistic data. Therefore, it is significant for studying the problem of probabilistic data management over big data environment. As a classic query, range query over probabilistic data has been fully studied. However, the state of art efforts are not suitable since they all suffer from highly updating cost. In this paper, we propose a novel index named HGD-Tree for solving this problem. First of all, we propose a group of novel strategies for handling newly arrival objects. In this way, we could efficiently apply the insertion, deletion, and updating on the premise of balancing tree structure. In addition, we propose a novel partition-based structure to approach the probability density function of object, where the structure could self-adjust the partition resolution so as to cater for the underlying of uncertain data. Besides, our proposed structure is expressed by a few bit vectors. The above two strategies guarantee low space cost of the proposed index. Last but not least, we propose a novel algorithm for supporting the range query which could effectively apply the analysis and extensive experimental results algorithms. pruning under few bitwise operations. Theoretical demonstrate the effectiveness of the proposed
出处 《计算机学报》 EI CSCD 北大核心 2016年第10期1929-1946,共18页 Chinese Journal of Computers
基金 国家"九七三"重点基础研究发展规划项目基金(2012CB316201) 国家自然科学基金(61272178 61572122 61173031 61129002 61532021 U1401256) 国家优秀青年科学基金(61322208)资助~~
关键词 大数据 概率数据 索引 概率概要信息 多分辨率网格 big-data range query index summary multi-resolution grid
  • 相关文献

参考文献1

二级参考文献98

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:163
  • 2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 3Deshpande A, viprin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 4Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 5Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 6Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 7Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81
  • 8Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502
  • 9Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66
  • 10Zimanyi E. Query evaluation in probabilistic databases. Theoretical Computer Science, 1997, 171(1-2): 179-219

共引文献185

同被引文献24

引证文献2

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部