Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp...Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.展开更多
经典DBSCAN(density based spatial clustering of applications with noise)算法需要人工指定邻域半径(Eps)和点数阈值(Minpts),且均为全局参数,导致聚类准确率低。针对此问题,为了提高经典DBSCAN聚类算法的聚类准确率,基于网格划分思...经典DBSCAN(density based spatial clustering of applications with noise)算法需要人工指定邻域半径(Eps)和点数阈值(Minpts),且均为全局参数,导致聚类准确率低。针对此问题,为了提高经典DBSCAN聚类算法的聚类准确率,基于网格划分思想,提出了一种局部自适应DBSCAN聚类算法。根据数据集自身特征生成网格空间,将特征数据映射至相应的网格空间;利用高斯核函数估计每个网格区间的局部密度;联合多维度网格密度分布信息,寻找无连接或弱连接高密度网格之间的区域,同时统计同区域的波峰数量,从而自适应确定各区域的Eps及Minpts参数;使用每个区域独有的参数作为DBSCAN算法输入,并进行聚类。实验结果表明,该算法能够在聚类过程中自适应确定每个局部区域的Eps和Minpts参数,聚类准确率高且耗时较低。展开更多
内河水上交通事故时有发生,对水路运输安全、高效发展带来威胁。研究提出一种基于自适应参数的DBSCAN(Density-Based Spatial Clustering of Applications with Noise)方法,用于识别内河事故黑点水域。该方法支持对邻域半径ε和邻域中...内河水上交通事故时有发生,对水路运输安全、高效发展带来威胁。研究提出一种基于自适应参数的DBSCAN(Density-Based Spatial Clustering of Applications with Noise)方法,用于识别内河事故黑点水域。该方法支持对邻域半径ε和邻域中数据对象数目阈值P_(min)参数的自动选取,可提高聚类分析的精度和效率。基于2010—2019年长江干线下游散货船舶事故数据开展案例研究,对各典型事故黑点段的事故特征和事故原因进行分析,得到8个事故黑点。此外,采用Getis-Ord General G聚类识别事故黑点中的高等级事故区域,得到事故黑点及高等级事故主要分布于江心洲、桥区、港口码头区域。研究结果与实际情况基本吻合,一定程度上表明了该方法在内河水上交通事故分布特征分析上的科学性和实用性。展开更多
基金The author extends his appreciation to theDeputyship forResearch&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number(IFPSAU-2021/01/17758).
文摘Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.
文摘经典DBSCAN(density based spatial clustering of applications with noise)算法需要人工指定邻域半径(Eps)和点数阈值(Minpts),且均为全局参数,导致聚类准确率低。针对此问题,为了提高经典DBSCAN聚类算法的聚类准确率,基于网格划分思想,提出了一种局部自适应DBSCAN聚类算法。根据数据集自身特征生成网格空间,将特征数据映射至相应的网格空间;利用高斯核函数估计每个网格区间的局部密度;联合多维度网格密度分布信息,寻找无连接或弱连接高密度网格之间的区域,同时统计同区域的波峰数量,从而自适应确定各区域的Eps及Minpts参数;使用每个区域独有的参数作为DBSCAN算法输入,并进行聚类。实验结果表明,该算法能够在聚类过程中自适应确定每个局部区域的Eps和Minpts参数,聚类准确率高且耗时较低。
文摘内河水上交通事故时有发生,对水路运输安全、高效发展带来威胁。研究提出一种基于自适应参数的DBSCAN(Density-Based Spatial Clustering of Applications with Noise)方法,用于识别内河事故黑点水域。该方法支持对邻域半径ε和邻域中数据对象数目阈值P_(min)参数的自动选取,可提高聚类分析的精度和效率。基于2010—2019年长江干线下游散货船舶事故数据开展案例研究,对各典型事故黑点段的事故特征和事故原因进行分析,得到8个事故黑点。此外,采用Getis-Ord General G聚类识别事故黑点中的高等级事故区域,得到事故黑点及高等级事故主要分布于江心洲、桥区、港口码头区域。研究结果与实际情况基本吻合,一定程度上表明了该方法在内河水上交通事故分布特征分析上的科学性和实用性。