期刊文献+

Approaches for Scaling DBSCAN Algorithm to Large Spatial Databases 被引量:13

Approaches for Scaling DBSCAN Algorithm to Large Spatial Databases
原文传递
导出
摘要 The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms. The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第6期509-526,共18页 计算机科学技术学报(英文版)
基金 This work was supported by the National Natural Science Foundation of China! (No.69743001) the National Doctoral Subject Fou
关键词 spatial database CLUSTERING fast DBSCAN algorithm data sampling data partitioning PARALLEL spatial database, clustering, fast DBSCAN algorithm, data sampling, data partitioning, parallel
  • 相关文献

参考文献6

  • 1Sheikholeslami G,Proceedings of the 24th VLDB Condrence,1998年,428页
  • 2Zhang W,Proceedings of the 23rd VLDB Conference,1997年,186页
  • 3Chen M S,IEEE Trans.KDE,1996年,8卷,6期,866页
  • 4Zhang T,Proceedings of the ACM SIGMOD International Conference on Management of Data,1996年,103页
  • 5Ester M,Proceedings of 4th Int Symposium on Large Spatial Databases Portland ME 1995 In,1995年,951卷,67页
  • 6Ng R T,Proceedings of the20th VLDB Conference,1994年,144页

同被引文献84

引证文献13

二级引证文献72

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部