Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformat...Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.展开更多
The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of ...The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius Y,.ad and minimum number of points in neighborhood Np~,. So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.展开更多
This paper deals with the problem of piecewise auto regressive systems with exogenous input(PWARX) model identification based on clustering solution. This problem involves both the estimation of the parameters of the ...This paper deals with the problem of piecewise auto regressive systems with exogenous input(PWARX) model identification based on clustering solution. This problem involves both the estimation of the parameters of the affine sub-models and the hyper planes defining the partitions of the state-input regression. The existing identification methods present three main drawbacks which limit its effectiveness. First, most of them may converge to local minima in the case of poor initializations because they are based on the optimization using nonlinear criteria. Second, they use simple and ineffective techniques to remove outliers. Third, most of them assume that the number of sub-models is known a priori. To overcome these drawbacks, we suggest the use of the density-based spatial clustering of applications with noise(DBSCAN) algorithm. The results presented in this paper illustrate the performance of our methods in comparison with the existing approach. An application of the developed approach to an olive oil esterification reactor is also proposed in order to validate the simulation results.展开更多
针对港内水域船舶交通密集且轨迹复杂度高所导致的聚类算法参数敏感、聚类结果不全面的问题,提出一种相似性距离筛选方法。该方法利用经纬度、对地航速、航向和艏向等数据,构建多种相似性距离。采用带有噪声的基于密度的空间聚类(densit...针对港内水域船舶交通密集且轨迹复杂度高所导致的聚类算法参数敏感、聚类结果不全面的问题,提出一种相似性距离筛选方法。该方法利用经纬度、对地航速、航向和艏向等数据,构建多种相似性距离。采用带有噪声的基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法获取每种相似性距离的聚类结果。计算每种聚类结果的轮廓系数、戴维森堡丁指数和聚类簇数等3项指标,分析DBSCAN超参数变化下这3项指标的稳定性,筛选出稳定性高的相似性距离。采用筛选出的稳定相似性距离进行轨迹聚类,分析并筛选出最优相似性距离。实验验证了筛选方法的有效性,表明基于经纬度的豪斯多夫(Hausdorff)距离与基于航向的动态时间规整(dynamic time warping,DTW)距离组合的聚类结果最佳,能更全面地完成港内船舶轨迹聚类,并识别典型交通流。本文研究成果能为港口交通流识别和特征数据挖掘提供有效方法,为船舶轨迹聚类相似性距离选择提供指导。展开更多
针对静态和动态救援场景下的多无人机协同任务调度问题,提出基于密度的噪声应用空间聚类-一致性包算法(density-based spatial clustering of applications with noise-consensus-based bundle algorithm,DBSCAN-CBBA)。首先,针对任务...针对静态和动态救援场景下的多无人机协同任务调度问题,提出基于密度的噪声应用空间聚类-一致性包算法(density-based spatial clustering of applications with noise-consensus-based bundle algorithm,DBSCAN-CBBA)。首先,针对任务执行阶段存在的场景不确定以及无人机携带物资载荷限制等问题,建立了一种更为符合救援实际的多任务分配模型。然后,优化了一致性包算法的任务包构建结构以提高算法效率和搜索最优解的能力。第1阶段通过基于密度聚类算法生成候选任务集合,并通过随机方式构建非候选任务集合;第2阶段通过无人机之间的通信,消解它们因独立构建任务包而产生的冲突。最后,将该算法分别应用于静态和实时动态任务分配场景。仿真实验结果表明,该算法可较为高效地找到合理的任务分配方案。展开更多
为有效识别和剔除风电机组实测数据中的异常数据,通过分析风电机组实测数据的高维特征,提出一种基于流形学习的异常数据识别算法。首先,采用k-近邻互信息算法实现风电机组特征变量选择;随后,使用将样本间距离度量替换为欧几里得度量和...为有效识别和剔除风电机组实测数据中的异常数据,通过分析风电机组实测数据的高维特征,提出一种基于流形学习的异常数据识别算法。首先,采用k-近邻互信息算法实现风电机组特征变量选择;随后,使用将样本间距离度量替换为欧几里得度量和局部主成分分析(local principal component analysis,LPCA)差别加权和的优化t-分布随机近邻嵌入(t-distributed stochastic neighbor embedding,t-SNE)算法挖掘出高维流形数据中具有内在规律的低维特征,使得具有不同分布特征的数据在可视化二维空间中显著分离;最后,采用基于密度的噪声空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法对二维空间中的数据进行聚类。结果表明,与主成分分析(principal component analysis,PCA)算法、局部线性嵌入(locally linear embedding,LLE)算法和原t-SNE算法相比,所提方法能够对各种复杂工况数据进行可视化分离聚类,并对异常数据进行识别和剔除。展开更多
为解决近年来用户行业变化特性加剧导致的难以准确辨识用户档案信息变动的问题,文中提出一种基于数据驱动的负荷特征异常辨识方法。首先,提出一种两阶段行业典型负荷形态构建方法,利用基于层次密度的含噪声应用空间聚类(hierarchical de...为解决近年来用户行业变化特性加剧导致的难以准确辨识用户档案信息变动的问题,文中提出一种基于数据驱动的负荷特征异常辨识方法。首先,提出一种两阶段行业典型负荷形态构建方法,利用基于层次密度的含噪声应用空间聚类(hierarchical density-based spatial clustering of applications with noise,HDBSCAN)提取用户在不同场景下的典型日负荷曲线,并利用改进的K-means算法对提取出的典型日负荷曲线进行聚类分析,构建行业的典型负荷形态;其次,提出一种多维场景负荷特征异常智能研判方法,通过构造用户的负荷特征,使用熵权法评估行业典型场景的相对重要性,并采用单分类支持向量机(one-class support vector machine,OCSVM)算法量化每个场景下的用户负荷特征的异常程度,通过加权计算得到用户的综合嫌疑得分并排序,从而实现对负荷特征异常用户的准确辨识。最后,采用某地区实际用户数据进行算例验证。仿真结果表明,所提方法在行业典型负荷场景构建及负荷特征异常辨识方面表现出良好的可行性与实用价值。展开更多
通过分析基于密度的带噪空间聚类算法(DBSCAN,density-based spatial clustering of applications with noise)和模糊C均值(FCM,fuzzy C-means)聚类算法的聚类性能,本文提出一种快速的基于几何代数的自适应典型航迹生成算法。首先,利用K...通过分析基于密度的带噪空间聚类算法(DBSCAN,density-based spatial clustering of applications with noise)和模糊C均值(FCM,fuzzy C-means)聚类算法的聚类性能,本文提出一种快速的基于几何代数的自适应典型航迹生成算法。首先,利用K-means聚类算法进行航班运行时间的归一化;然后,利用几何代数优越的时空表达和计算能力,给出了航迹转弯判定、DBSCAN聚类和FCM聚类的几何代数描述;最后,在几何代数空间中对转弯运动状态和直线运动状态的航迹分别自适应地进行DBSCAN聚类和FCM聚类形成典型航迹.实验结果表明,本文自适应典型航迹的生成速度较欧氏空间方法可提升30%以上。展开更多
针对电力线点云提取过程中自动化程度低且结果易受参数影响出现欠分割或过分割的问题,结合机载激光雷达(light detection and ranging,LiDAR)点云数据的分布特点,提出一种基于改进空间密度聚类算法的激光点云电力线的提取方法。该方法...针对电力线点云提取过程中自动化程度低且结果易受参数影响出现欠分割或过分割的问题,结合机载激光雷达(light detection and ranging,LiDAR)点云数据的分布特点,提出一种基于改进空间密度聚类算法的激光点云电力线的提取方法。该方法首先通过空间分割改进高程滤波算法完成电力线点云的粗提取;其次,利用基于距离-密度的方法和数学期望计算方法获得空间密度聚类的最佳参数,避免了繁杂的人工调参过程。实验结果显示,相较于空间密度聚类算法,所提算法效率显著提高,降低了约60%电力线提取时间,实现了单根电力线点云的自动化和高效提取。展开更多
基金Professor Hong Yu at Intelligent Fishery Innovative Team(No.C202109)in School of Information Engineering of Dalian Ocean University for her support of this workfunded by the National Natural Science Foundation of China(No.31800615 and No.21933010)。
文摘Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.
文摘The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius Y,.ad and minimum number of points in neighborhood Np~,. So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.
文摘This paper deals with the problem of piecewise auto regressive systems with exogenous input(PWARX) model identification based on clustering solution. This problem involves both the estimation of the parameters of the affine sub-models and the hyper planes defining the partitions of the state-input regression. The existing identification methods present three main drawbacks which limit its effectiveness. First, most of them may converge to local minima in the case of poor initializations because they are based on the optimization using nonlinear criteria. Second, they use simple and ineffective techniques to remove outliers. Third, most of them assume that the number of sub-models is known a priori. To overcome these drawbacks, we suggest the use of the density-based spatial clustering of applications with noise(DBSCAN) algorithm. The results presented in this paper illustrate the performance of our methods in comparison with the existing approach. An application of the developed approach to an olive oil esterification reactor is also proposed in order to validate the simulation results.
文摘针对港内水域船舶交通密集且轨迹复杂度高所导致的聚类算法参数敏感、聚类结果不全面的问题,提出一种相似性距离筛选方法。该方法利用经纬度、对地航速、航向和艏向等数据,构建多种相似性距离。采用带有噪声的基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法获取每种相似性距离的聚类结果。计算每种聚类结果的轮廓系数、戴维森堡丁指数和聚类簇数等3项指标,分析DBSCAN超参数变化下这3项指标的稳定性,筛选出稳定性高的相似性距离。采用筛选出的稳定相似性距离进行轨迹聚类,分析并筛选出最优相似性距离。实验验证了筛选方法的有效性,表明基于经纬度的豪斯多夫(Hausdorff)距离与基于航向的动态时间规整(dynamic time warping,DTW)距离组合的聚类结果最佳,能更全面地完成港内船舶轨迹聚类,并识别典型交通流。本文研究成果能为港口交通流识别和特征数据挖掘提供有效方法,为船舶轨迹聚类相似性距离选择提供指导。
文摘针对静态和动态救援场景下的多无人机协同任务调度问题,提出基于密度的噪声应用空间聚类-一致性包算法(density-based spatial clustering of applications with noise-consensus-based bundle algorithm,DBSCAN-CBBA)。首先,针对任务执行阶段存在的场景不确定以及无人机携带物资载荷限制等问题,建立了一种更为符合救援实际的多任务分配模型。然后,优化了一致性包算法的任务包构建结构以提高算法效率和搜索最优解的能力。第1阶段通过基于密度聚类算法生成候选任务集合,并通过随机方式构建非候选任务集合;第2阶段通过无人机之间的通信,消解它们因独立构建任务包而产生的冲突。最后,将该算法分别应用于静态和实时动态任务分配场景。仿真实验结果表明,该算法可较为高效地找到合理的任务分配方案。
文摘为有效识别和剔除风电机组实测数据中的异常数据,通过分析风电机组实测数据的高维特征,提出一种基于流形学习的异常数据识别算法。首先,采用k-近邻互信息算法实现风电机组特征变量选择;随后,使用将样本间距离度量替换为欧几里得度量和局部主成分分析(local principal component analysis,LPCA)差别加权和的优化t-分布随机近邻嵌入(t-distributed stochastic neighbor embedding,t-SNE)算法挖掘出高维流形数据中具有内在规律的低维特征,使得具有不同分布特征的数据在可视化二维空间中显著分离;最后,采用基于密度的噪声空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法对二维空间中的数据进行聚类。结果表明,与主成分分析(principal component analysis,PCA)算法、局部线性嵌入(locally linear embedding,LLE)算法和原t-SNE算法相比,所提方法能够对各种复杂工况数据进行可视化分离聚类,并对异常数据进行识别和剔除。
文摘为解决近年来用户行业变化特性加剧导致的难以准确辨识用户档案信息变动的问题,文中提出一种基于数据驱动的负荷特征异常辨识方法。首先,提出一种两阶段行业典型负荷形态构建方法,利用基于层次密度的含噪声应用空间聚类(hierarchical density-based spatial clustering of applications with noise,HDBSCAN)提取用户在不同场景下的典型日负荷曲线,并利用改进的K-means算法对提取出的典型日负荷曲线进行聚类分析,构建行业的典型负荷形态;其次,提出一种多维场景负荷特征异常智能研判方法,通过构造用户的负荷特征,使用熵权法评估行业典型场景的相对重要性,并采用单分类支持向量机(one-class support vector machine,OCSVM)算法量化每个场景下的用户负荷特征的异常程度,通过加权计算得到用户的综合嫌疑得分并排序,从而实现对负荷特征异常用户的准确辨识。最后,采用某地区实际用户数据进行算例验证。仿真结果表明,所提方法在行业典型负荷场景构建及负荷特征异常辨识方面表现出良好的可行性与实用价值。
文摘通过分析基于密度的带噪空间聚类算法(DBSCAN,density-based spatial clustering of applications with noise)和模糊C均值(FCM,fuzzy C-means)聚类算法的聚类性能,本文提出一种快速的基于几何代数的自适应典型航迹生成算法。首先,利用K-means聚类算法进行航班运行时间的归一化;然后,利用几何代数优越的时空表达和计算能力,给出了航迹转弯判定、DBSCAN聚类和FCM聚类的几何代数描述;最后,在几何代数空间中对转弯运动状态和直线运动状态的航迹分别自适应地进行DBSCAN聚类和FCM聚类形成典型航迹.实验结果表明,本文自适应典型航迹的生成速度较欧氏空间方法可提升30%以上。
文摘针对电力线点云提取过程中自动化程度低且结果易受参数影响出现欠分割或过分割的问题,结合机载激光雷达(light detection and ranging,LiDAR)点云数据的分布特点,提出一种基于改进空间密度聚类算法的激光点云电力线的提取方法。该方法首先通过空间分割改进高程滤波算法完成电力线点云的粗提取;其次,利用基于距离-密度的方法和数学期望计算方法获得空间密度聚类的最佳参数,避免了繁杂的人工调参过程。实验结果显示,相较于空间密度聚类算法,所提算法效率显著提高,降低了约60%电力线提取时间,实现了单根电力线点云的自动化和高效提取。