期刊文献+

一种可扩展的面向海量数据高维最近邻检索的对等索引结构

A Scalable Peer-to-Peer Indexing Scheme for High Dimensional Nearest-neighbor Search in Massive Datasets
在线阅读 下载PDF
导出
摘要 大规模数据集的最近邻检索,目前逐渐成为计算机领域中一个重要问题.采用一种分布式对等索引结构,对海量数据集进行最近邻检索.通过采用lp范数下的局部敏感哈希算法对高维空间的数据进行相似检索,并利用典型的哈希算法与不均匀Hilbert曲线结合,将高维的局部敏感哈希数据桶空间映射到一维DHT索引空间.系统设计时同时考虑相似性检索和P2P网络维持的需求,索引本身具备局部敏感特性,以及DHT网络的负载均衡能力.文中将展示如何利用局部敏感哈希有效地在P2P网络中执行最近邻搜索问题.实验基于真实数据,进一步验证本方法的有效性,以及扩展性上相比于其他方法的优势. Nearest-Neighbor ( NN ) search is an increasingly important problem in many computer applications, in which high dimen- sional massive datasets are involved. In this paper, we propose a P2P distributed indexing scheme to perform NN search in massive datasets. We consider the locality sensitive hashing ( LSH ) scheme under lp-metric to support similarity search in high dimensional space and we put forward a novel mapping from the multi-dimensional LSH bucket space to the one dimension DHT key space using non-uniform Hilbert Curve. We consider the requirements of both similarity search and the maintenance of P2P network. Our inde- xing scheme preserves the locality sensitive property of LSH indexes and guarantees load balance in DHT networks. We also show how to leverage the LSH indexing scheme to efficiently process NN search in P2P network. Experiments on real world data confirm the effectiveness and efficiency of our approach and show the scalability gains compared to state-of-the-art schemes.
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第4期765-769,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60975045)资助 国家科技支撑计划课题项目(2011BAH11B01)资助 中科院先导专项项目(XDA06030900)资助
关键词 最近邻搜索 局部敏感哈希 分布式哈希表 HILBERT曲线 nearest neighbor search locality sensitive hashing DHT hilbert curve
  • 相关文献

参考文献13

  • 1Haghani P, Michel S, Aberer K. Distributed similarity search in high dimensions using locality sensitive hashing [ C ]. In Proceedings of the 12th International Conference on Extending Database Technolo- gy ( EDBT 09 ), ACM,2009:744-755.
  • 2Stoica I, Morris R, Karger D, et al. Chord: a scalable peer-to-peer lookup service for Internet applications [ C ]. Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Proto- cols for Computer Communications ( SIGCOMM 01 ), ACM, 2001 : 149-160.
  • 3Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions [ C ]. Proceedings of the Twentieth Annual Symposium on Computational Geometry ( SCG 04), ACM ,2004:253-262.
  • 4Bawa M, Condie T, Ganesan P. LSH forest: self-tuning indexes for similarity search[ C]. Proceedings of the 14th International Confer- ence on World Wide Web ( WWW 05 ), ACM,2005:651-660.
  • 5Lv Q, Josephson M, Wang Z, et al. Multi-probe LSH : efficient inde- xing for high-dimensional similarity search[C]. Proceedings of the 33rd International Conference on Very Large Databases (VLDB 07) , ACM,2007:950-961.
  • 6Weber R,Schek H J,Blott S. A quantitative analysis and perform- ance study for similarity-search methods in high-dimensional spaces [C]. Proceedings of the 24th International Conference on Very Large Databases ( VLDB 98 ), ACM,1998 : 194-205.
  • 7Indyk P,Motwani R. Approximate nearest neighbor: towards remo- ving the curse of dimensionality [ C ]. Proceedings of the Symposi- um on Theory of Computing( STOC 98 ), ACM, 1998:604-613.
  • 8Sagan H. Space-filling curves[ M]. Springer-Verlag, 1994.
  • 9Zhang X, Shou L, Tan K, et al. iDISQUE: tuning high-dimensional similarity search in DHT networks[ C]. Proceedings of Internation- al Conference on Database Systems for Advanced Applications (DASFAA) ,2010 : 19-33.
  • 10Quang Hieu Vu, Mihai Lupu, Sai Wu. SiMPSON:efficient simi- larity search in metric spaces over P2P structured overlay networks [ C ]. Parallel Processing, Euro-Par, 2009:498-510.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部