Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
钢拱桥的线形监测是桥梁健康监测系统的重要组成部分。运用三维激光扫描技术,融合随机抽样一致(random sample consensus,RANSAC)算法对传统的具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noi...钢拱桥的线形监测是桥梁健康监测系统的重要组成部分。运用三维激光扫描技术,融合随机抽样一致(random sample consensus,RANSAC)算法对传统的具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noise,DBSCAN)算法进行改进,对钢拱桥拱肋线形进行提取。三维激光点云数据具有全面性和细节体现的优势,能够完整地呈现桥梁结构的形状和变形信息,融合RANSAC的改进DBSCAN算法根据钢拱桥结构特征对聚类结果进行约束,能够很好地实现删除离散点及桥面、横撑、横联和腹杆部分的点云这一目的。根据融合RANSAC的改进DBSCAN算法提取出的点云进行关键点拟合,与人工提取结果进行对比,拱肋关键点提取误差均在毫米级,最大误差为9.2 mm,最小误差为0.1 mm,此提取方法能够更加准确有效地完成钢拱桥线形提取,使线形提取精度达到毫米级,大大降低了人力成本和时间成本,对钢拱桥的复杂结构有更好的鲁棒性,能很好地适应实际生产需求。展开更多
配电网环境复杂,配电网同步相量测量装置(distribution network synchronous phasor measurement unit, D-PMU)容易受到干扰而产生坏数据,进一步影响基于测量数据的应用效果。为了提高D-PMU数据质量,提出一种不依赖系统拓扑的基于密度...配电网环境复杂,配电网同步相量测量装置(distribution network synchronous phasor measurement unit, D-PMU)容易受到干扰而产生坏数据,进一步影响基于测量数据的应用效果。为了提高D-PMU数据质量,提出一种不依赖系统拓扑的基于密度的噪场应用空间聚类(density-based spatial clustering of applications with noise, DBSCAN)的配电网同步测量坏数据检测方法。首先利用基于密度的聚类算法DBSCAN进行异常数据检测。通过轮廓系数和邓恩指数对DBSCAN的聚类结果进行综合评价。利用麻雀搜索算法实现自适应参数调整,解决检测时需要预先处理训练、标记数据的问题。在此基础上,将时间序列聚类的K-Medoids算法和动态时间规整算法相结合,通过衡量不同时间序列之间的相似性,解决了D-PMU在电气联系较弱时对扰动数据与坏数据的区分问题,增强了数据处理的准确性与噪声环境下的稳健性。仿真和实际数据的测试结果表明,所提方法能有效区分真实扰动数据并准确识别D-PMU坏数据。展开更多
道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合...道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合确定困难导致聚类效果欠佳,因此,提出了一种基于改进DBSCAN的道路障碍物点云聚类方法 .首先,在确定Eps领域时利用孤立核函数来改进传统的距离度量方式,提高了DBSCAN聚类对不同密度区域的适应性和准确性.其次,针对猎豹优化算法(Cheetah Optimizer,CO)在信息共享和迭代更新方面的不足,提出了一种基于及时更新机制与兼容度量策略的CO优化算法(Timely Updating Mechanisms and Compatible Metric Strategies for CO Algorithms,TCCO),通过实时更新操作确保每次迭代的优秀信息得到及时沟通共享,并在全局更新时基于非支配排序与拥挤距离优化淘汰机制,平衡全局搜索和局部开发能力,提高了收敛速度和收敛精度.最后,利用孤立度量改进Eps领域,并利用TCCO优化DBSCAN聚类,自适应确定参数,提高了聚类精度和效率.在八个UCI数据集上进行测试,仿真结果表明,提出的TCCO-DBSCAN算法与CO-DBSCAN,SSA-DBSCAN,DBSCAN,KMC方法相比,F-Measure,ARI,NMI指标均有明显提升,且聚类精度更优.通过激光雷达点云数据障碍物聚类的实验验证,证明TCCO-DBSCAN能够有效地适应点云数据密度变化,获得更好的道路障碍物聚类效果,为辅助驾驶中障碍物检测提供支持.展开更多
The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mi...The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.展开更多
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘配电网环境复杂,配电网同步相量测量装置(distribution network synchronous phasor measurement unit, D-PMU)容易受到干扰而产生坏数据,进一步影响基于测量数据的应用效果。为了提高D-PMU数据质量,提出一种不依赖系统拓扑的基于密度的噪场应用空间聚类(density-based spatial clustering of applications with noise, DBSCAN)的配电网同步测量坏数据检测方法。首先利用基于密度的聚类算法DBSCAN进行异常数据检测。通过轮廓系数和邓恩指数对DBSCAN的聚类结果进行综合评价。利用麻雀搜索算法实现自适应参数调整,解决检测时需要预先处理训练、标记数据的问题。在此基础上,将时间序列聚类的K-Medoids算法和动态时间规整算法相结合,通过衡量不同时间序列之间的相似性,解决了D-PMU在电气联系较弱时对扰动数据与坏数据的区分问题,增强了数据处理的准确性与噪声环境下的稳健性。仿真和实际数据的测试结果表明,所提方法能有效区分真实扰动数据并准确识别D-PMU坏数据。
文摘道路点云数据的障碍物检测技术在智能交通系统和自动驾驶中至关重要.传统的基于密度的空间聚类(DensityBased Spatial Clustering of Applications with Noise,DBSCAN)算法在处理高维或不同密度区域数据时,由于距离度量低效、参数组合确定困难导致聚类效果欠佳,因此,提出了一种基于改进DBSCAN的道路障碍物点云聚类方法 .首先,在确定Eps领域时利用孤立核函数来改进传统的距离度量方式,提高了DBSCAN聚类对不同密度区域的适应性和准确性.其次,针对猎豹优化算法(Cheetah Optimizer,CO)在信息共享和迭代更新方面的不足,提出了一种基于及时更新机制与兼容度量策略的CO优化算法(Timely Updating Mechanisms and Compatible Metric Strategies for CO Algorithms,TCCO),通过实时更新操作确保每次迭代的优秀信息得到及时沟通共享,并在全局更新时基于非支配排序与拥挤距离优化淘汰机制,平衡全局搜索和局部开发能力,提高了收敛速度和收敛精度.最后,利用孤立度量改进Eps领域,并利用TCCO优化DBSCAN聚类,自适应确定参数,提高了聚类精度和效率.在八个UCI数据集上进行测试,仿真结果表明,提出的TCCO-DBSCAN算法与CO-DBSCAN,SSA-DBSCAN,DBSCAN,KMC方法相比,F-Measure,ARI,NMI指标均有明显提升,且聚类精度更优.通过激光雷达点云数据障碍物聚类的实验验证,证明TCCO-DBSCAN能够有效地适应点云数据密度变化,获得更好的道路障碍物聚类效果,为辅助驾驶中障碍物检测提供支持.
基金This work was supported by the National Natural Science Foundation of China! (No.69743001) the National Doctoral Subject Fou
文摘The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.