Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mi...The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.展开更多
The density-based clustering algorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996), and has the following advantages: first, Gree...The density-based clustering algorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996), and has the following advantages: first, Greedy algorithm substitutes for R*-tree (Bechmann et al., 1990) in DBSCAN to index the clustering space so that the clustering time cost is decreased to great extent and I/O memory load is reduced as well; second, the merging condition to approach to arbitrary-shaped clusters is designed carefully so that a single threshold can distinguish correctly all clusters in a large spatial dataset though some density-skewed clusters live in it. Finally, authors investigate a robotic navigation and test two artificial datasets by the proposed algorithm to verify its effectiveness and efficiency.展开更多
钢拱桥的线形监测是桥梁健康监测系统的重要组成部分。运用三维激光扫描技术,融合随机抽样一致(random sample consensus,RANSAC)算法对传统的具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noi...钢拱桥的线形监测是桥梁健康监测系统的重要组成部分。运用三维激光扫描技术,融合随机抽样一致(random sample consensus,RANSAC)算法对传统的具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noise,DBSCAN)算法进行改进,对钢拱桥拱肋线形进行提取。三维激光点云数据具有全面性和细节体现的优势,能够完整地呈现桥梁结构的形状和变形信息,融合RANSAC的改进DBSCAN算法根据钢拱桥结构特征对聚类结果进行约束,能够很好地实现删除离散点及桥面、横撑、横联和腹杆部分的点云这一目的。根据融合RANSAC的改进DBSCAN算法提取出的点云进行关键点拟合,与人工提取结果进行对比,拱肋关键点提取误差均在毫米级,最大误差为9.2 mm,最小误差为0.1 mm,此提取方法能够更加准确有效地完成钢拱桥线形提取,使线形提取精度达到毫米级,大大降低了人力成本和时间成本,对钢拱桥的复杂结构有更好的鲁棒性,能很好地适应实际生产需求。展开更多
A convective and stratiform cloud classification method for weather radar is proposed based on the density-based spatial clustering of applications with noise(DBSCAN)algorithm.To identify convective and stratiform clo...A convective and stratiform cloud classification method for weather radar is proposed based on the density-based spatial clustering of applications with noise(DBSCAN)algorithm.To identify convective and stratiform clouds in different developmental phases,two-dimensional(2D)and three-dimensional(3D)models are proposed by applying reflectivity factors at 0.5°and at 0.5°,1.5°,and 2.4°elevation angles,respectively.According to the thresholds of the algorithm,which include echo intensity,the echo top height of 35 dBZ(ET),density threshold,andεneighborhood,cloud clusters can be marked into four types:deep-convective cloud(DCC),shallow-convective cloud(SCC),hybrid convective-stratiform cloud(HCS),and stratiform cloud(SFC)types.Each cloud cluster type is further identified as a core area and boundary area,which can provide more abundant cloud structure information.The algorithm is verified using the volume scan data observed with new-generation S-band weather radars in Nanjing,Xuzhou,and Qingdao.The results show that cloud clusters can be intuitively identified as core and boundary points,which change in area continuously during the process of convective evolution,by the improved DBSCAN algorithm.Therefore,the occurrence and disappearance of convective weather can be estimated in advance by observing the changes of the classification.Because density thresholds are different and multiple elevations are utilized in the 3D model,the identified echo types and areas are dissimilar between the 2D and 3D models.The 3D model identifies larger convective and stratiform clouds than the 2D model.However,the developing convective clouds of small areas at lower heights cannot be identified with the 3D model because they are covered by thick stratiform clouds.In addition,the 3D model can avoid the influence of the melting layer and better suggest convective clouds in the developmental stage.展开更多
配电网环境复杂,配电网同步相量测量装置(distribution network synchronous phasor measurement unit, D-PMU)容易受到干扰而产生坏数据,进一步影响基于测量数据的应用效果。为了提高D-PMU数据质量,提出一种不依赖系统拓扑的基于密度...配电网环境复杂,配电网同步相量测量装置(distribution network synchronous phasor measurement unit, D-PMU)容易受到干扰而产生坏数据,进一步影响基于测量数据的应用效果。为了提高D-PMU数据质量,提出一种不依赖系统拓扑的基于密度的噪场应用空间聚类(density-based spatial clustering of applications with noise, DBSCAN)的配电网同步测量坏数据检测方法。首先利用基于密度的聚类算法DBSCAN进行异常数据检测。通过轮廓系数和邓恩指数对DBSCAN的聚类结果进行综合评价。利用麻雀搜索算法实现自适应参数调整,解决检测时需要预先处理训练、标记数据的问题。在此基础上,将时间序列聚类的K-Medoids算法和动态时间规整算法相结合,通过衡量不同时间序列之间的相似性,解决了D-PMU在电气联系较弱时对扰动数据与坏数据的区分问题,增强了数据处理的准确性与噪声环境下的稳健性。仿真和实际数据的测试结果表明,所提方法能有效区分真实扰动数据并准确识别D-PMU坏数据。展开更多
群目标的航迹起始是群目标跟踪的第一步,常规的航迹起始算法应用在群目标上会产生大量虚假航迹,而传统的群目标起始算法存在抗杂波能力差且未考虑多群重叠的问题。因此提出了一种基于循环Hough变换和基于密度的空间聚类(Density-Based S...群目标的航迹起始是群目标跟踪的第一步,常规的航迹起始算法应用在群目标上会产生大量虚假航迹,而传统的群目标起始算法存在抗杂波能力差且未考虑多群重叠的问题。因此提出了一种基于循环Hough变换和基于密度的空间聚类(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)算法的群起始算法。算法通过对多次扫描的点迹做随机Hough变换投影到参数空间,利用群目标运动特性一致的特点通过聚类提取出阈值最大的群,考虑到群的参数积累会影响其他的群或者目标,因此提取完再循环做随机Hough变换依次提取出阈值最大的群直至结束。最后将提取出的群利用DBSCAN算法进行群分割完成群起始。文章最后通过仿真验证,表明该算法不仅有较强的抗杂波能力,同时也能解决密集群的起始难题,且计算量不大,可以在工程上应用。展开更多
道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合...道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合确定困难导致聚类效果欠佳,因此,提出了一种基于改进DBSCAN的道路障碍物点云聚类方法 .首先,在确定Eps领域时利用孤立核函数来改进传统的距离度量方式,提高了DBSCAN聚类对不同密度区域的适应性和准确性.其次,针对猎豹优化算法(Cheetah Optimizer,CO)在信息共享和迭代更新方面的不足,提出了一种基于及时更新机制与兼容度量策略的CO优化算法(Timely Updating Mechanisms and Compatible Metric Strategies for CO Algorithms,TCCO),通过实时更新操作确保每次迭代的优秀信息得到及时沟通共享,并在全局更新时基于非支配排序与拥挤距离优化淘汰机制,平衡全局搜索和局部开发能力,提高了收敛速度和收敛精度.最后,利用孤立度量改进Eps领域,并利用TCCO优化DBSCAN聚类,自适应确定参数,提高了聚类精度和效率.在八个UCI数据集上进行测试,仿真结果表明,提出的TCCO-DBSCAN算法与CO-DBSCAN,SSA-DBSCAN,DBSCAN,KMC方法相比,F-Measure,ARI,NMI指标均有明显提升,且聚类精度更优.通过激光雷达点云数据障碍物聚类的实验验证,证明TCCO-DBSCAN能够有效地适应点云数据密度变化,获得更好的道路障碍物聚类效果,为辅助驾驶中障碍物检测提供支持.展开更多
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
基金This work was supported by the National Natural Science Foundation of China! (No.69743001) the National Doctoral Subject Fou
文摘The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.
文摘The density-based clustering algorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996), and has the following advantages: first, Greedy algorithm substitutes for R*-tree (Bechmann et al., 1990) in DBSCAN to index the clustering space so that the clustering time cost is decreased to great extent and I/O memory load is reduced as well; second, the merging condition to approach to arbitrary-shaped clusters is designed carefully so that a single threshold can distinguish correctly all clusters in a large spatial dataset though some density-skewed clusters live in it. Finally, authors investigate a robotic navigation and test two artificial datasets by the proposed algorithm to verify its effectiveness and efficiency.
基金funded by the Key-Area Research and Development Program of Guangdong Province(Grant No.2020B1111200001)the Key project of monitoring,early warning and prevention of major natural disasters of China(Grant No.2019YFC1510304)+1 种基金the S&T Program of Hebei(Grant No.19275408D)the Scientific Research Projects of Weather Modification in Northwest China(Grant No.RYSY201905).
文摘A convective and stratiform cloud classification method for weather radar is proposed based on the density-based spatial clustering of applications with noise(DBSCAN)algorithm.To identify convective and stratiform clouds in different developmental phases,two-dimensional(2D)and three-dimensional(3D)models are proposed by applying reflectivity factors at 0.5°and at 0.5°,1.5°,and 2.4°elevation angles,respectively.According to the thresholds of the algorithm,which include echo intensity,the echo top height of 35 dBZ(ET),density threshold,andεneighborhood,cloud clusters can be marked into four types:deep-convective cloud(DCC),shallow-convective cloud(SCC),hybrid convective-stratiform cloud(HCS),and stratiform cloud(SFC)types.Each cloud cluster type is further identified as a core area and boundary area,which can provide more abundant cloud structure information.The algorithm is verified using the volume scan data observed with new-generation S-band weather radars in Nanjing,Xuzhou,and Qingdao.The results show that cloud clusters can be intuitively identified as core and boundary points,which change in area continuously during the process of convective evolution,by the improved DBSCAN algorithm.Therefore,the occurrence and disappearance of convective weather can be estimated in advance by observing the changes of the classification.Because density thresholds are different and multiple elevations are utilized in the 3D model,the identified echo types and areas are dissimilar between the 2D and 3D models.The 3D model identifies larger convective and stratiform clouds than the 2D model.However,the developing convective clouds of small areas at lower heights cannot be identified with the 3D model because they are covered by thick stratiform clouds.In addition,the 3D model can avoid the influence of the melting layer and better suggest convective clouds in the developmental stage.
文摘配电网环境复杂,配电网同步相量测量装置(distribution network synchronous phasor measurement unit, D-PMU)容易受到干扰而产生坏数据,进一步影响基于测量数据的应用效果。为了提高D-PMU数据质量,提出一种不依赖系统拓扑的基于密度的噪场应用空间聚类(density-based spatial clustering of applications with noise, DBSCAN)的配电网同步测量坏数据检测方法。首先利用基于密度的聚类算法DBSCAN进行异常数据检测。通过轮廓系数和邓恩指数对DBSCAN的聚类结果进行综合评价。利用麻雀搜索算法实现自适应参数调整,解决检测时需要预先处理训练、标记数据的问题。在此基础上,将时间序列聚类的K-Medoids算法和动态时间规整算法相结合,通过衡量不同时间序列之间的相似性,解决了D-PMU在电气联系较弱时对扰动数据与坏数据的区分问题,增强了数据处理的准确性与噪声环境下的稳健性。仿真和实际数据的测试结果表明,所提方法能有效区分真实扰动数据并准确识别D-PMU坏数据。
文摘群目标的航迹起始是群目标跟踪的第一步,常规的航迹起始算法应用在群目标上会产生大量虚假航迹,而传统的群目标起始算法存在抗杂波能力差且未考虑多群重叠的问题。因此提出了一种基于循环Hough变换和基于密度的空间聚类(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)算法的群起始算法。算法通过对多次扫描的点迹做随机Hough变换投影到参数空间,利用群目标运动特性一致的特点通过聚类提取出阈值最大的群,考虑到群的参数积累会影响其他的群或者目标,因此提取完再循环做随机Hough变换依次提取出阈值最大的群直至结束。最后将提取出的群利用DBSCAN算法进行群分割完成群起始。文章最后通过仿真验证,表明该算法不仅有较强的抗杂波能力,同时也能解决密集群的起始难题,且计算量不大,可以在工程上应用。
文摘道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合确定困难导致聚类效果欠佳,因此,提出了一种基于改进DBSCAN的道路障碍物点云聚类方法 .首先,在确定Eps领域时利用孤立核函数来改进传统的距离度量方式,提高了DBSCAN聚类对不同密度区域的适应性和准确性.其次,针对猎豹优化算法(Cheetah Optimizer,CO)在信息共享和迭代更新方面的不足,提出了一种基于及时更新机制与兼容度量策略的CO优化算法(Timely Updating Mechanisms and Compatible Metric Strategies for CO Algorithms,TCCO),通过实时更新操作确保每次迭代的优秀信息得到及时沟通共享,并在全局更新时基于非支配排序与拥挤距离优化淘汰机制,平衡全局搜索和局部开发能力,提高了收敛速度和收敛精度.最后,利用孤立度量改进Eps领域,并利用TCCO优化DBSCAN聚类,自适应确定参数,提高了聚类精度和效率.在八个UCI数据集上进行测试,仿真结果表明,提出的TCCO-DBSCAN算法与CO-DBSCAN,SSA-DBSCAN,DBSCAN,KMC方法相比,F-Measure,ARI,NMI指标均有明显提升,且聚类精度更优.通过激光雷达点云数据障碍物聚类的实验验证,证明TCCO-DBSCAN能够有效地适应点云数据密度变化,获得更好的道路障碍物聚类效果,为辅助驾驶中障碍物检测提供支持.