An improved clustering algorithm was presented based on density-isoline clustering algorithm. The new algorithm can do a better job than density-isoline clustering when dealing with noise, not having to literately cal...An improved clustering algorithm was presented based on density-isoline clustering algorithm. The new algorithm can do a better job than density-isoline clustering when dealing with noise, not having to literately calculate the cluster centers for the samples batching into clusters instead of one by one. After repeated experiments, the results demonstrate that the improved density-isoline clustering algorithm is significantly more efficiency in clustering with noises and overcomes the drawbacks that traditional algorithm DILC deals with noise and that the efficiency of running time is improved greatly.展开更多
The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of ...The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius Y,.ad and minimum number of points in neighborhood Np~,. So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.展开更多
Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformat...Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.展开更多
This paper deals with the problem of piecewise auto regressive systems with exogenous input(PWARX) model identification based on clustering solution. This problem involves both the estimation of the parameters of the ...This paper deals with the problem of piecewise auto regressive systems with exogenous input(PWARX) model identification based on clustering solution. This problem involves both the estimation of the parameters of the affine sub-models and the hyper planes defining the partitions of the state-input regression. The existing identification methods present three main drawbacks which limit its effectiveness. First, most of them may converge to local minima in the case of poor initializations because they are based on the optimization using nonlinear criteria. Second, they use simple and ineffective techniques to remove outliers. Third, most of them assume that the number of sub-models is known a priori. To overcome these drawbacks, we suggest the use of the density-based spatial clustering of applications with noise(DBSCAN) algorithm. The results presented in this paper illustrate the performance of our methods in comparison with the existing approach. An application of the developed approach to an olive oil esterification reactor is also proposed in order to validate the simulation results.展开更多
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
Gobi spans a large area of China,surpassing the combined expanse of mobile dunes and semi-fixed dunes.Its presence significantly influences the movement of sand and dust.However,the complex origins and diverse materia...Gobi spans a large area of China,surpassing the combined expanse of mobile dunes and semi-fixed dunes.Its presence significantly influences the movement of sand and dust.However,the complex origins and diverse materials constituting the Gobi result in notable differences in saltation processes across various Gobi surfaces.It is challenging to describe these processes according to a uniform morphology.Therefore,it becomes imperative to articulate surface characteristics through parameters such as the three-dimensional(3D)size and shape of gravel.Collecting morphology information for Gobi gravels is essential for studying its genesis and sand saltation.To enhance the efficiency and information yield of gravel parameter measurements,this study conducted field experiments in the Gobi region across Dunhuang City,Guazhou County,and Yumen City(administrated by Jiuquan City),Gansu Province,China in March 2023.A research framework and methodology for measuring 3D parameters of gravel using point cloud were developed,alongside improved calculation formulas for 3D parameters including gravel grain size,volume,flatness,roundness,sphericity,and equivalent grain size.Leveraging multi-view geometry technology for 3D reconstruction allowed for establishing an optimal data acquisition scheme characterized by high point cloud reconstruction efficiency and clear quality.Additionally,the proposed methodology incorporated point cloud clustering,segmentation,and filtering techniques to isolate individual gravel point clouds.Advanced point cloud algorithms,including the Oriented Bounding Box(OBB),point cloud slicing method,and point cloud triangulation,were then deployed to calculate the 3D parameters of individual gravels.These systematic processes allow precise and detailed characterization of individual gravels.For gravel grain size and volume,the correlation coefficients between point cloud and manual measurements all exceeded 0.9000,confirming the feasibility of the proposed methodology for measuring 3D parameters of individual gravels.The proposed workflow yields accurate calculations of relevant parameters for Gobi gravels,providing essential data support for subsequent studies on Gobi environments.展开更多
新型配电系统柔性消弧装置及定位技术均需充分挖掘相电流暂态特征来实现选相、选线和故障定位。针对此问题,对新型配电系统单相接地故障相电流暂态分布特性进行分析,提出了一种基于相电流多维时频分布特征差异的新型配电系统单相接地故...新型配电系统柔性消弧装置及定位技术均需充分挖掘相电流暂态特征来实现选相、选线和故障定位。针对此问题,对新型配电系统单相接地故障相电流暂态分布特性进行分析,提出了一种基于相电流多维时频分布特征差异的新型配电系统单相接地故障定位新方法。依据故障相电流故障暂态量与非故障相电流故障暂态量的差异性,通过灰色关联度算法完成故障选相;对各出线始端监测点以及疑似故障馈线分支监测点的相电流暂态波形进行26维多维时频特征的提取,通过经方差优化的t-分布近邻嵌入算法(variance-optimized t-distributed stochastic neighbor embedding,VTSNE)进行筛选和降维,并对处理后的特征数据进行基于密度的有噪空间聚类算法(density-based special clustering of application with noise,DBSCAN)聚类完成故障选线和故障区段定位。该方法在某绿色港口10 kV新型配电系统模型中得到验证,在不同故障初相角、不同过渡电阻等故障场景下均可准确可靠定位故障位置,对采样同步精度及采样频率要求低,易于工程实现。展开更多
地下水管漏水普遍,造成水资源和经济损失。有效的漏水探测技术缺乏,尤其因供水管道埋设在地下,难以及时发现漏水。长波极化SAR信号穿透能力强,可记录次地表含水量信息,为漏水探测提供了新机遇。本研究利用SAOCOM数据,通过Singh七分量极...地下水管漏水普遍,造成水资源和经济损失。有效的漏水探测技术缺乏,尤其因供水管道埋设在地下,难以及时发现漏水。长波极化SAR信号穿透能力强,可记录次地表含水量信息,为漏水探测提供了新机遇。本研究利用SAOCOM数据,通过Singh七分量极化分解方法提取城区地表散射能量,结合地面实测数据,训练了随机森林、多层感知机和XGBoost三种机器学习模型预测漏点。最后构建集成模型,通过投票机制提高漏点检测准确性,准确率达81.20%。通过DBSCAN(density-based spatial clustering of applications with noise)密度聚类方法优化预测结果,将潜在漏点减少至1 265个,发现所有真实漏点均位于疑似漏点的150 m缓冲区内。本研究展示了利用PolSAR数据结合机器学习技术在城市水管漏损检测中的潜力,并为未来相关研究提供了有价值的方法和经验。展开更多
物联网设备持续产出的数据中会掺杂部分异常数据,导致物联网通信数据分类的质量与效率下降。因此,提出一种基于集成学习的物联网通信数据快速分类方法。从物联网设备收集通信数据,利用孤立森林算法确定物联网通信数据样本的异常分值,并...物联网设备持续产出的数据中会掺杂部分异常数据,导致物联网通信数据分类的质量与效率下降。因此,提出一种基于集成学习的物联网通信数据快速分类方法。从物联网设备收集通信数据,利用孤立森林算法确定物联网通信数据样本的异常分值,并去除异常分值较高的数据,通过基于密度的带噪声应用空间聚类(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)算法整合去除异常后的数据,结合集成学习算法实现物联网通信数据快速分类。实验结果表明,所提方法的物联网通信数据分类准确率始终在97.2%以上,物联网通信数据分类时间均值约为1.55 s,具有良好的应用潜力。展开更多
针对旅客公交下车站点数据缺失问题,提出融合公交与地铁刷卡数据的旅客公交出行链三级补全策略。该补全策略包括换乘行程链补全策略、公交出行链补全预测模型和最大概率下车预测模型共三个级别。利用同日及隔日换乘场景下的空间位置约...针对旅客公交下车站点数据缺失问题,提出融合公交与地铁刷卡数据的旅客公交出行链三级补全策略。该补全策略包括换乘行程链补全策略、公交出行链补全预测模型和最大概率下车预测模型共三个级别。利用同日及隔日换乘场景下的空间位置约束补全常规公交下车点;引入改进基于密度的聚类算法DBSCAN(Density-Based Spatial Clustering of Applications with Noise),结合加权几何中心构建公交下车站点优化模型;基于公交线路下车概率分布构建最大概率下车模型,通过该三级补全策略实现公交下车站点全流程补全体系的搭建。基于真实出行数据进行试验,结果表明,所提的补全策略可使公交出行链补全准确率达到92.5%,较传统算法准确率提高10%以上,且超80%的误差站点集中于实际站点±1站范围内,证明了其在准确率和鲁棒性方面的优越性。展开更多
文摘An improved clustering algorithm was presented based on density-isoline clustering algorithm. The new algorithm can do a better job than density-isoline clustering when dealing with noise, not having to literately calculate the cluster centers for the samples batching into clusters instead of one by one. After repeated experiments, the results demonstrate that the improved density-isoline clustering algorithm is significantly more efficiency in clustering with noises and overcomes the drawbacks that traditional algorithm DILC deals with noise and that the efficiency of running time is improved greatly.
文摘The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius Y,.ad and minimum number of points in neighborhood Np~,. So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.
基金Professor Hong Yu at Intelligent Fishery Innovative Team(No.C202109)in School of Information Engineering of Dalian Ocean University for her support of this workfunded by the National Natural Science Foundation of China(No.31800615 and No.21933010)。
文摘Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories.Usually,it is a critical step for interpreting complex conformational changes or interaction mechanisms.As one of the density-based clustering algorithms,find density peaks(FDP)is an accurate and reasonable candidate for the molecular conformation clustering.However,facing the rapidly increasing simulation length due to the increase in computing power,the low computing efficiency of FDP limits its application potential.Here we propose a marginal extension to FDP named K-means find density peaks(KFDP)to solve the mass source consuming problem.In KFDP,the points are initially clustered by a high efficiency clustering algorithm,such as K-means.Cluster centers are defined as typical points with a weight which represents the cluster size.Then,the weighted typical points are clustered again by FDP,and then are refined as core,boundary,and redefined halo points.In this way,KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n^(2))to O(n).We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle,secondary structure or contact map.The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.
文摘This paper deals with the problem of piecewise auto regressive systems with exogenous input(PWARX) model identification based on clustering solution. This problem involves both the estimation of the parameters of the affine sub-models and the hyper planes defining the partitions of the state-input regression. The existing identification methods present three main drawbacks which limit its effectiveness. First, most of them may converge to local minima in the case of poor initializations because they are based on the optimization using nonlinear criteria. Second, they use simple and ineffective techniques to remove outliers. Third, most of them assume that the number of sub-models is known a priori. To overcome these drawbacks, we suggest the use of the density-based spatial clustering of applications with noise(DBSCAN) algorithm. The results presented in this paper illustrate the performance of our methods in comparison with the existing approach. An application of the developed approach to an olive oil esterification reactor is also proposed in order to validate the simulation results.
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
基金funded by the National Natural Science Foundation of China(42071014).
文摘Gobi spans a large area of China,surpassing the combined expanse of mobile dunes and semi-fixed dunes.Its presence significantly influences the movement of sand and dust.However,the complex origins and diverse materials constituting the Gobi result in notable differences in saltation processes across various Gobi surfaces.It is challenging to describe these processes according to a uniform morphology.Therefore,it becomes imperative to articulate surface characteristics through parameters such as the three-dimensional(3D)size and shape of gravel.Collecting morphology information for Gobi gravels is essential for studying its genesis and sand saltation.To enhance the efficiency and information yield of gravel parameter measurements,this study conducted field experiments in the Gobi region across Dunhuang City,Guazhou County,and Yumen City(administrated by Jiuquan City),Gansu Province,China in March 2023.A research framework and methodology for measuring 3D parameters of gravel using point cloud were developed,alongside improved calculation formulas for 3D parameters including gravel grain size,volume,flatness,roundness,sphericity,and equivalent grain size.Leveraging multi-view geometry technology for 3D reconstruction allowed for establishing an optimal data acquisition scheme characterized by high point cloud reconstruction efficiency and clear quality.Additionally,the proposed methodology incorporated point cloud clustering,segmentation,and filtering techniques to isolate individual gravel point clouds.Advanced point cloud algorithms,including the Oriented Bounding Box(OBB),point cloud slicing method,and point cloud triangulation,were then deployed to calculate the 3D parameters of individual gravels.These systematic processes allow precise and detailed characterization of individual gravels.For gravel grain size and volume,the correlation coefficients between point cloud and manual measurements all exceeded 0.9000,confirming the feasibility of the proposed methodology for measuring 3D parameters of individual gravels.The proposed workflow yields accurate calculations of relevant parameters for Gobi gravels,providing essential data support for subsequent studies on Gobi environments.
文摘新型配电系统柔性消弧装置及定位技术均需充分挖掘相电流暂态特征来实现选相、选线和故障定位。针对此问题,对新型配电系统单相接地故障相电流暂态分布特性进行分析,提出了一种基于相电流多维时频分布特征差异的新型配电系统单相接地故障定位新方法。依据故障相电流故障暂态量与非故障相电流故障暂态量的差异性,通过灰色关联度算法完成故障选相;对各出线始端监测点以及疑似故障馈线分支监测点的相电流暂态波形进行26维多维时频特征的提取,通过经方差优化的t-分布近邻嵌入算法(variance-optimized t-distributed stochastic neighbor embedding,VTSNE)进行筛选和降维,并对处理后的特征数据进行基于密度的有噪空间聚类算法(density-based special clustering of application with noise,DBSCAN)聚类完成故障选线和故障区段定位。该方法在某绿色港口10 kV新型配电系统模型中得到验证,在不同故障初相角、不同过渡电阻等故障场景下均可准确可靠定位故障位置,对采样同步精度及采样频率要求低,易于工程实现。
文摘地下水管漏水普遍,造成水资源和经济损失。有效的漏水探测技术缺乏,尤其因供水管道埋设在地下,难以及时发现漏水。长波极化SAR信号穿透能力强,可记录次地表含水量信息,为漏水探测提供了新机遇。本研究利用SAOCOM数据,通过Singh七分量极化分解方法提取城区地表散射能量,结合地面实测数据,训练了随机森林、多层感知机和XGBoost三种机器学习模型预测漏点。最后构建集成模型,通过投票机制提高漏点检测准确性,准确率达81.20%。通过DBSCAN(density-based spatial clustering of applications with noise)密度聚类方法优化预测结果,将潜在漏点减少至1 265个,发现所有真实漏点均位于疑似漏点的150 m缓冲区内。本研究展示了利用PolSAR数据结合机器学习技术在城市水管漏损检测中的潜力,并为未来相关研究提供了有价值的方法和经验。
文摘物联网设备持续产出的数据中会掺杂部分异常数据,导致物联网通信数据分类的质量与效率下降。因此,提出一种基于集成学习的物联网通信数据快速分类方法。从物联网设备收集通信数据,利用孤立森林算法确定物联网通信数据样本的异常分值,并去除异常分值较高的数据,通过基于密度的带噪声应用空间聚类(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)算法整合去除异常后的数据,结合集成学习算法实现物联网通信数据快速分类。实验结果表明,所提方法的物联网通信数据分类准确率始终在97.2%以上,物联网通信数据分类时间均值约为1.55 s,具有良好的应用潜力。
文摘针对旅客公交下车站点数据缺失问题,提出融合公交与地铁刷卡数据的旅客公交出行链三级补全策略。该补全策略包括换乘行程链补全策略、公交出行链补全预测模型和最大概率下车预测模型共三个级别。利用同日及隔日换乘场景下的空间位置约束补全常规公交下车点;引入改进基于密度的聚类算法DBSCAN(Density-Based Spatial Clustering of Applications with Noise),结合加权几何中心构建公交下车站点优化模型;基于公交线路下车概率分布构建最大概率下车模型,通过该三级补全策略实现公交下车站点全流程补全体系的搭建。基于真实出行数据进行试验,结果表明,所提的补全策略可使公交出行链补全准确率达到92.5%,较传统算法准确率提高10%以上,且超80%的误差站点集中于实际站点±1站范围内,证明了其在准确率和鲁棒性方面的优越性。