Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro c...Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.展开更多
In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reve...In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.展开更多
We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases...We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules. Due to the merged clusters around the center cluster, the clustering results show high accuracy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA) proposed in 2004. Experimental results show that CDC has better performance.展开更多
Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missi...Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems,we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.展开更多
Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflecti...Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflection point method”to analyze specific deviation trajectories,determine deviation thresholds,and identify commonly used deviation paths.By combining multiple similarity metrics,including Euclidean distance,Hausdorff distance,and sector edit distance,with the density-based spatial clustering of applications with noise(DBSCAN)algorithm,the study clusters deviation trajectories to construct a multi-option trajectory set for city pairs.A case study of 23578 flight trajectories between the Guangzhou airport cluster and the Shanghai airport cluster demonstrates the effectiveness of the proposed framework.Experimental results show that sector edit distance achieves superior clustering performance compared to Euclidean and Hausdorff distances,with higher silhouette coefficients and lower Davies⁃Bouldin indices,ensuring better intra-cluster compactness and inter-cluster separation.Based on clustering results,19 representative trajectory options are identified,covering both nominal and deviation paths,which significantly enhance route diversity and reflect actual flight practices.This provides a practical basis for optimizing flight paths and scheduling,enhancing the flexibility of route selection for flights between city pairs.展开更多
Cluster-basedmodels have numerous application scenarios in vehicular ad-hoc networks(VANETs)and can greatly help improve the communication performance of VANETs.However,the frequent movement of vehicles can often lead...Cluster-basedmodels have numerous application scenarios in vehicular ad-hoc networks(VANETs)and can greatly help improve the communication performance of VANETs.However,the frequent movement of vehicles can often lead to changes in the network topology,thereby reducing cluster stability in urban scenarios.To address this issue,we propose a clustering model based on the density peak clustering(DPC)method and sparrow search algorithm(SSA),named SDPC.First,the model constructs a fitness function based on the parameters obtained from the DPC method and deploys the SSA for iterative optimization to select cluster heads(CHs).Then,the vehicles that have not been selected as CHs are assigned to appropriate clusters by comprehensively considering the distance parameter and link-reliability parameter.Finally,cluster maintenance strategies are considered to tackle the changes in the clusters’organizational structure.To verify the performance of the model,we conducted a simulation on a real-world scenario for multiple metrics related to clusters’stability.The results show that compared with the APROVE and the GAPC,SDPC showed clear performance advantages,indicating that SDPC can effectively ensure VANETs’cluster stability in urban scenarios.展开更多
Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability...Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability.In this paper,Hybrid Golden Jackal,and Improved Whale Optimization Algorithm(HGJIWOA)is proposed as an effective and optimal routing protocol that guarantees efficient routing of data packets in the established between the CHs and the movable sink.This HGJIWOA included the phases of Dynamic Lens-Imaging Learning Strategy and Novel Update Rules for determining the reliable route essential for data packets broadcasting attained through fitness measure estimation-based CH selection.The process of CH selection achieved using Golden Jackal Optimization Algorithm(GJOA)completely depends on the factors of maintainability,consistency,trust,delay,and energy.The adopted GJOA algorithm play a dominant role in determining the optimal path of routing depending on the parameter of reduced delay and minimal distance.It further utilized Improved Whale Optimisation Algorithm(IWOA)for forwarding the data from chosen CHs to the BS via optimized route depending on the parameters of energy and distance.It also included a reliable route maintenance process that aids in deciding the selected route through which data need to be transmitted or re-routed.The simulation outcomes of the proposed HGJIWOA mechanism with different sensor nodes confirmed an improved mean throughput of 18.21%,sustained residual energy of 19.64%with minimized end-to-end delay of 21.82%,better than the competitive CH selection approaches.展开更多
Data clustering is an essential technique for analyzing complex datasets and continues to be a central research topic in data analysis.Traditional clustering algorithms,such as K-means,are widely used due to their sim...Data clustering is an essential technique for analyzing complex datasets and continues to be a central research topic in data analysis.Traditional clustering algorithms,such as K-means,are widely used due to their simplicity and efficiency.This paper proposes a novel Spiral Mechanism-Optimized Phasmatodea Population Evolution Algorithm(SPPE)to improve clustering performance.The SPPE algorithm introduces several enhancements to the standard Phasmatodea Population Evolution(PPE)algorithm.Firstly,a Variable Neighborhood Search(VNS)factor is incorporated to strengthen the local search capability and foster population diversity.Secondly,a position update model,incorporating a spiral mechanism,is designed to improve the algorithm’s global exploration and convergence speed.Finally,a dynamic balancing factor,guided by fitness values,adjusts the search process to balance exploration and exploitation effectively.The performance of SPPE is first validated on CEC2013 benchmark functions,where it demonstrates excellent convergence speed and superior optimization results compared to several state-of-the-art metaheuristic algorithms.To further verify its practical applicability,SPPE is combined with the K-means algorithm for data clustering and tested on seven datasets.Experimental results show that SPPE-K-means improves clustering accuracy,reduces dependency on initialization,and outperforms other clustering approaches.This study highlights SPPE’s robustness and efficiency in solving both optimization and clustering challenges,making it a promising tool for complex data analysis tasks.展开更多
As vehicular networks grow increasingly complex due to high node mobility and dynamic traffic conditions,efficient clustering mechanisms are vital to ensure stable and scalable communication.Recent studies have emphas...As vehicular networks grow increasingly complex due to high node mobility and dynamic traffic conditions,efficient clustering mechanisms are vital to ensure stable and scalable communication.Recent studies have emphasized the need for adaptive clustering strategies to improve performance in Intelligent Transportation Systems(ITS).This paper presents the Grasshopper Optimization Algorithm for Vehicular Network Clustering(GOAVNET)algorithm,an innovative approach to optimal vehicular clustering in Vehicular Ad-Hoc Networks(VANETs),leveraging the Grasshopper Optimization Algorithm(GOA)to address the critical challenges of traffic congestion and communication inefficiencies in Intelligent Transportation Systems(ITS).The proposed GOA-VNET employs an iterative and interactive optimization mechanism to dynamically adjust node positions and cluster configurations,ensuring robust adaptability to varying vehicular densities and transmission ranges.Key features of GOA-VNET include the utilization of attraction zone,repulsion zone,and comfort zone parameters,which collectively enhance clustering efficiency and minimize congestion within Regions of Interest(ROI).By managing cluster configurations and node densities effectively,GOA-VNET ensures balanced load distribution and seamless data transmission,even in scenarios with high vehicular densities and varying transmission ranges.Comparative evaluations against the Whale Optimization Algorithm(WOA)and Grey Wolf Optimization(GWO)demonstrate that GOA-VNET consistently outperforms these methods by achieving superior clustering efficiency,reducing the number of clusters by up to 10%in high-density scenarios,and improving data transmission reliability.Simulation results reveal that under a 100-600 m transmission range,GOA-VNET achieves an average reduction of 8%-15%in the number of clusters and maintains a 5%-10%improvement in packet delivery ratio(PDR)compared to baseline algorithms.Additionally,the algorithm incorporates a heat transfer-inspired load-balancing mechanism,ensuring equitable distribution of nodes among cluster leaders(CLs)and maintaining a stable network environment.These results validate GOA-VNET as a reliable and scalable solution for VANETs,with significant potential to support next-generation ITS.Future research could further enhance the algorithm by integrating multi-objective optimization techniques and exploring broader applications in complex traffic scenarios.展开更多
We propose a robust earthquake clustering method:the Bayesian Gaussian mixture model with nearest-neighbor distance(BGMM-NND)algorithm.Unlike the conventional nearest neighbor distance method,the BGMM-NND algorithm el...We propose a robust earthquake clustering method:the Bayesian Gaussian mixture model with nearest-neighbor distance(BGMM-NND)algorithm.Unlike the conventional nearest neighbor distance method,the BGMM-NND algorithm eliminates the need for hyperparameter tuning or reliance on fixed thresholds,offering enhanced flexibility for clustering across varied seismic scales.By integrating cumulative probability and BGMM with principal component analysis(PCA),the BGMM-NND algorithm effectively distinguishes between background and triggered earthquakes while maintaining the magnitude component and resolving the issue of excessively large spatial cluster domains.We apply the BGMM-NND algorithm to the Sichuan–Yunnan seismic catalog from 1971 to 2024,revealing notable variations in earthquake frequency,triggering characteristics,and recurrence patterns across different fault zones.Distinct clustering and triggering behaviors are identified along different segments of the Longmenshan Fault.Multiple seismic modes,namely,the short-distance mode,the medium-distance mode,the repeating-like mode,the uniform background mode,and the Wenchuan mode,are uncovered.The algorithm's flexibility and robust performance in earthquake clustering makes it a valuable tool for exploring seismicity characteristics,offering new insights into earthquake clustering and the spatiotemporal patterns of seismic activity.展开更多
Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as ...Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as inaccurate node clustering,low energy efficiency,and shortened network lifespan in practical deployments,which significantly limit their large-scale application.To address these issues,this paper proposes an Adaptive Chaotic Ant Colony Optimization algorithm(AC-ACO),aiming to optimize the energy utilization and system lifespan of WSNs.AC-ACO combines the path-planning capability of Ant Colony Optimization(ACO)with the dynamic characteristics of chaotic mapping and introduces an adaptive mechanism to enhance the algorithm’s flexibility and adaptability.By dynamically adjusting the pheromone evaporation factor and heuristic weights,efficient node clustering is achieved.Additionally,a chaotic mapping initialization strategy is employed to enhance population diversity and avoid premature convergence.To validate the algorithm’s performance,this paper compares AC-ACO with clustering methods such as Low-Energy Adaptive Clustering Hierarchy(LEACH),ACO,Particle Swarm Optimization(PSO),and Genetic Algorithm(GA).Simulation results demonstrate that AC-ACO outperforms the compared algorithms in key metrics such as energy consumption optimization,network lifetime extension,and communication delay reduction,providing an efficient solution for improving energy efficiency and ensuring long-term stable operation of wireless sensor networks.展开更多
Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected ...Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected from rock outcrops.In response,we propose a workflow that balances accuracy and efficiency to extract discontinuities from massive point clouds.The proposed method employs voxel filtering to downsample point clouds,constructs a point cloud topology using K-d trees,utilizes principal component analysis to calculate the point cloud normals,and employs the pointwise clustering(PWC)algorithm to extract discontinuities from rock outcrop point clouds.This method provides information on the location and orientation(dip direction and dip angle)of the discontinuities,and the modified whale optimization algorithm(MWOA)is utilized to identify major discontinuity sets and their average orientations.Performance evaluations based on three real cases demonstrate that the proposed method significantly reduces computational time costs without sacrificing accuracy.In particular,the method yields more reasonable extraction results for discontinuities with certain undulations.The presented approach offers a novel tool for efficiently extracting discontinuities from large-scale point clouds.展开更多
The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie...The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.展开更多
In the era of big data,personalised recommendation systems are essential for enhancing user engagement and driving business growth.However,traditional recommendation algorithms,such as collaborative filtering,face sig...In the era of big data,personalised recommendation systems are essential for enhancing user engagement and driving business growth.However,traditional recommendation algorithms,such as collaborative filtering,face significant challenges due to data sparsity,algorithm scalability,and the difficulty of adapting to dynamic user preferences.These limitations hinder the ability of systems to provide highly accurate and personalised recommendations.To address these challenges,this paper proposes a clustering-based recommendation method that integrates an enhanced Grasshopper Optimisation Algorithm(GOA),termed LCGOA,to improve the accuracy and efficiency of recommendation systems by optimising cluster centroids in a dynamic environment.By combining the K-means algorithm with the enhanced GOA,which incorporates a Lévy flight mechanism and multi-strategy co-evolution,our method overcomes the centroid sensitivity issue,a key limitation in traditional clustering techniques.Experimental results across multiple datasets show that the proposed LCGOA-based method significantly outperforms conventional recommendation algorithms in terms of recommendation accuracy,offering more relevant content to users and driving greater customer satisfaction and business growth.展开更多
Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp...Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.展开更多
Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,lo...Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.展开更多
In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering a...In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the s...For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the signal is extracted and optimized by using a clustering algorithm, support vector machine is trained by grading algorithm so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram in this paper. Simulation results show that the average recognition rate based on this algorithm is enhanced over 30% compared with methods that adopting clustering algorithm or support vector machine respectively under the low SNR. The average recognition rate can reach 90% when the SNR is 5 dB, and the method is easy to be achieved so that it has broad application prospect in the modulating recognition.展开更多
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
文摘Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.
文摘In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.
文摘We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules. Due to the merged clusters around the center cluster, the clustering results show high accuracy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA) proposed in 2004. Experimental results show that CDC has better performance.
基金supported by the National Natural Science Foundation of China(Nos.U1866602 and 71773025)the National Key Research and Development Program of China(No.2020YFB1006104)
文摘Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems,we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.
基金supported in part by Boeing Company and Nanjing University of Aeronautics and Astronautics(NUAA)through the Research on Decision Support Technology of Air Traffic Operation Management in Convective Weather under Project 2022-GT-129in part by the Postgraduate Research and Practice Innovation Program of NUAA(No.xcxjh20240709)。
文摘Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflection point method”to analyze specific deviation trajectories,determine deviation thresholds,and identify commonly used deviation paths.By combining multiple similarity metrics,including Euclidean distance,Hausdorff distance,and sector edit distance,with the density-based spatial clustering of applications with noise(DBSCAN)algorithm,the study clusters deviation trajectories to construct a multi-option trajectory set for city pairs.A case study of 23578 flight trajectories between the Guangzhou airport cluster and the Shanghai airport cluster demonstrates the effectiveness of the proposed framework.Experimental results show that sector edit distance achieves superior clustering performance compared to Euclidean and Hausdorff distances,with higher silhouette coefficients and lower Davies⁃Bouldin indices,ensuring better intra-cluster compactness and inter-cluster separation.Based on clustering results,19 representative trajectory options are identified,covering both nominal and deviation paths,which significantly enhance route diversity and reflect actual flight practices.This provides a practical basis for optimizing flight paths and scheduling,enhancing the flexibility of route selection for flights between city pairs.
文摘Cluster-basedmodels have numerous application scenarios in vehicular ad-hoc networks(VANETs)and can greatly help improve the communication performance of VANETs.However,the frequent movement of vehicles can often lead to changes in the network topology,thereby reducing cluster stability in urban scenarios.To address this issue,we propose a clustering model based on the density peak clustering(DPC)method and sparrow search algorithm(SSA),named SDPC.First,the model constructs a fitness function based on the parameters obtained from the DPC method and deploys the SSA for iterative optimization to select cluster heads(CHs).Then,the vehicles that have not been selected as CHs are assigned to appropriate clusters by comprehensively considering the distance parameter and link-reliability parameter.Finally,cluster maintenance strategies are considered to tackle the changes in the clusters’organizational structure.To verify the performance of the model,we conducted a simulation on a real-world scenario for multiple metrics related to clusters’stability.The results show that compared with the APROVE and the GAPC,SDPC showed clear performance advantages,indicating that SDPC can effectively ensure VANETs’cluster stability in urban scenarios.
文摘Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability.In this paper,Hybrid Golden Jackal,and Improved Whale Optimization Algorithm(HGJIWOA)is proposed as an effective and optimal routing protocol that guarantees efficient routing of data packets in the established between the CHs and the movable sink.This HGJIWOA included the phases of Dynamic Lens-Imaging Learning Strategy and Novel Update Rules for determining the reliable route essential for data packets broadcasting attained through fitness measure estimation-based CH selection.The process of CH selection achieved using Golden Jackal Optimization Algorithm(GJOA)completely depends on the factors of maintainability,consistency,trust,delay,and energy.The adopted GJOA algorithm play a dominant role in determining the optimal path of routing depending on the parameter of reduced delay and minimal distance.It further utilized Improved Whale Optimisation Algorithm(IWOA)for forwarding the data from chosen CHs to the BS via optimized route depending on the parameters of energy and distance.It also included a reliable route maintenance process that aids in deciding the selected route through which data need to be transmitted or re-routed.The simulation outcomes of the proposed HGJIWOA mechanism with different sensor nodes confirmed an improved mean throughput of 18.21%,sustained residual energy of 19.64%with minimized end-to-end delay of 21.82%,better than the competitive CH selection approaches.
文摘Data clustering is an essential technique for analyzing complex datasets and continues to be a central research topic in data analysis.Traditional clustering algorithms,such as K-means,are widely used due to their simplicity and efficiency.This paper proposes a novel Spiral Mechanism-Optimized Phasmatodea Population Evolution Algorithm(SPPE)to improve clustering performance.The SPPE algorithm introduces several enhancements to the standard Phasmatodea Population Evolution(PPE)algorithm.Firstly,a Variable Neighborhood Search(VNS)factor is incorporated to strengthen the local search capability and foster population diversity.Secondly,a position update model,incorporating a spiral mechanism,is designed to improve the algorithm’s global exploration and convergence speed.Finally,a dynamic balancing factor,guided by fitness values,adjusts the search process to balance exploration and exploitation effectively.The performance of SPPE is first validated on CEC2013 benchmark functions,where it demonstrates excellent convergence speed and superior optimization results compared to several state-of-the-art metaheuristic algorithms.To further verify its practical applicability,SPPE is combined with the K-means algorithm for data clustering and tested on seven datasets.Experimental results show that SPPE-K-means improves clustering accuracy,reduces dependency on initialization,and outperforms other clustering approaches.This study highlights SPPE’s robustness and efficiency in solving both optimization and clustering challenges,making it a promising tool for complex data analysis tasks.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2024-00337489Development of Data Drift Management Technology to Overcome Performance Degradation of AI Analysis Models).
文摘As vehicular networks grow increasingly complex due to high node mobility and dynamic traffic conditions,efficient clustering mechanisms are vital to ensure stable and scalable communication.Recent studies have emphasized the need for adaptive clustering strategies to improve performance in Intelligent Transportation Systems(ITS).This paper presents the Grasshopper Optimization Algorithm for Vehicular Network Clustering(GOAVNET)algorithm,an innovative approach to optimal vehicular clustering in Vehicular Ad-Hoc Networks(VANETs),leveraging the Grasshopper Optimization Algorithm(GOA)to address the critical challenges of traffic congestion and communication inefficiencies in Intelligent Transportation Systems(ITS).The proposed GOA-VNET employs an iterative and interactive optimization mechanism to dynamically adjust node positions and cluster configurations,ensuring robust adaptability to varying vehicular densities and transmission ranges.Key features of GOA-VNET include the utilization of attraction zone,repulsion zone,and comfort zone parameters,which collectively enhance clustering efficiency and minimize congestion within Regions of Interest(ROI).By managing cluster configurations and node densities effectively,GOA-VNET ensures balanced load distribution and seamless data transmission,even in scenarios with high vehicular densities and varying transmission ranges.Comparative evaluations against the Whale Optimization Algorithm(WOA)and Grey Wolf Optimization(GWO)demonstrate that GOA-VNET consistently outperforms these methods by achieving superior clustering efficiency,reducing the number of clusters by up to 10%in high-density scenarios,and improving data transmission reliability.Simulation results reveal that under a 100-600 m transmission range,GOA-VNET achieves an average reduction of 8%-15%in the number of clusters and maintains a 5%-10%improvement in packet delivery ratio(PDR)compared to baseline algorithms.Additionally,the algorithm incorporates a heat transfer-inspired load-balancing mechanism,ensuring equitable distribution of nodes among cluster leaders(CLs)and maintaining a stable network environment.These results validate GOA-VNET as a reliable and scalable solution for VANETs,with significant potential to support next-generation ITS.Future research could further enhance the algorithm by integrating multi-objective optimization techniques and exploring broader applications in complex traffic scenarios.
基金supported by the National Key Research and Development Program of China(Grant Nos.2021YFC3000705 and 2021YFC3000705-05)the National Natural Science Foundation of China(Grant No.42074049)the Youth Innovation Promotion Association of the Chinese Academy of Sciences(Grant No.2023471).
文摘We propose a robust earthquake clustering method:the Bayesian Gaussian mixture model with nearest-neighbor distance(BGMM-NND)algorithm.Unlike the conventional nearest neighbor distance method,the BGMM-NND algorithm eliminates the need for hyperparameter tuning or reliance on fixed thresholds,offering enhanced flexibility for clustering across varied seismic scales.By integrating cumulative probability and BGMM with principal component analysis(PCA),the BGMM-NND algorithm effectively distinguishes between background and triggered earthquakes while maintaining the magnitude component and resolving the issue of excessively large spatial cluster domains.We apply the BGMM-NND algorithm to the Sichuan–Yunnan seismic catalog from 1971 to 2024,revealing notable variations in earthquake frequency,triggering characteristics,and recurrence patterns across different fault zones.Distinct clustering and triggering behaviors are identified along different segments of the Longmenshan Fault.Multiple seismic modes,namely,the short-distance mode,the medium-distance mode,the repeating-like mode,the uniform background mode,and the Wenchuan mode,are uncovered.The algorithm's flexibility and robust performance in earthquake clustering makes it a valuable tool for exploring seismicity characteristics,offering new insights into earthquake clustering and the spatiotemporal patterns of seismic activity.
基金funded by the Natural Science Foundation of Xinjiang Uygur Autonomous Region:No.22D01B148Bidding Topics for the Center for Integration of Education and Production and Development of New Business in 2024:No.2024-KYJD05+1 种基金Basic Scientific Research Business Fee Project of Colleges and Universities in Autonomous Region:No.XJEDU2025P126Xinjiang College of Science&Technology School-level Scientific Research Fund Project:No.2024-KYTD01.
文摘Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as inaccurate node clustering,low energy efficiency,and shortened network lifespan in practical deployments,which significantly limit their large-scale application.To address these issues,this paper proposes an Adaptive Chaotic Ant Colony Optimization algorithm(AC-ACO),aiming to optimize the energy utilization and system lifespan of WSNs.AC-ACO combines the path-planning capability of Ant Colony Optimization(ACO)with the dynamic characteristics of chaotic mapping and introduces an adaptive mechanism to enhance the algorithm’s flexibility and adaptability.By dynamically adjusting the pheromone evaporation factor and heuristic weights,efficient node clustering is achieved.Additionally,a chaotic mapping initialization strategy is employed to enhance population diversity and avoid premature convergence.To validate the algorithm’s performance,this paper compares AC-ACO with clustering methods such as Low-Energy Adaptive Clustering Hierarchy(LEACH),ACO,Particle Swarm Optimization(PSO),and Genetic Algorithm(GA).Simulation results demonstrate that AC-ACO outperforms the compared algorithms in key metrics such as energy consumption optimization,network lifetime extension,and communication delay reduction,providing an efficient solution for improving energy efficiency and ensuring long-term stable operation of wireless sensor networks.
基金supported by the National Natural Science Foundation of China(Grant No.42407232)the Sichuan Science and Technology Program(Grant No.2024NSFSC0826).
文摘Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected from rock outcrops.In response,we propose a workflow that balances accuracy and efficiency to extract discontinuities from massive point clouds.The proposed method employs voxel filtering to downsample point clouds,constructs a point cloud topology using K-d trees,utilizes principal component analysis to calculate the point cloud normals,and employs the pointwise clustering(PWC)algorithm to extract discontinuities from rock outcrop point clouds.This method provides information on the location and orientation(dip direction and dip angle)of the discontinuities,and the modified whale optimization algorithm(MWOA)is utilized to identify major discontinuity sets and their average orientations.Performance evaluations based on three real cases demonstrate that the proposed method significantly reduces computational time costs without sacrificing accuracy.In particular,the method yields more reasonable extraction results for discontinuities with certain undulations.The presented approach offers a novel tool for efficiently extracting discontinuities from large-scale point clouds.
基金supported by the National Key Research and Development Program of China(2023YFB3307801)the National Natural Science Foundation of China(62394343,62373155,62073142)+3 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)under Grant B17017the Fundamental Research Funds for the Central Universities,Science Foundation of China University of Petroleum,Beijing(No.2462024YJRC011)the Open Research Project of the State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024B70).
文摘The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.
基金Natural Science Research Project of Education Department of Anhui Province of China,Grant/Award Number:2023AH051020Key Project of Anhui Province's Science and Technology Innovation Tackle Plan,Grant/Award Number:202423k09020040+3 种基金National Key Research and Development Program of China,Grant/Award Number:2023YFD1802200Natural Science Foundation of Anhui Province,Grant/Award Number:2308085MF21National Natural Science Foundation of China,Grant/Award Numbers:32472007,62301006,62306008University Synergy Innovation Program of Anhui Province,Grant/Award Number:GXXT-2022-046。
文摘In the era of big data,personalised recommendation systems are essential for enhancing user engagement and driving business growth.However,traditional recommendation algorithms,such as collaborative filtering,face significant challenges due to data sparsity,algorithm scalability,and the difficulty of adapting to dynamic user preferences.These limitations hinder the ability of systems to provide highly accurate and personalised recommendations.To address these challenges,this paper proposes a clustering-based recommendation method that integrates an enhanced Grasshopper Optimisation Algorithm(GOA),termed LCGOA,to improve the accuracy and efficiency of recommendation systems by optimising cluster centroids in a dynamic environment.By combining the K-means algorithm with the enhanced GOA,which incorporates a Lévy flight mechanism and multi-strategy co-evolution,our method overcomes the centroid sensitivity issue,a key limitation in traditional clustering techniques.Experimental results across multiple datasets show that the proposed LCGOA-based method significantly outperforms conventional recommendation algorithms in terms of recommendation accuracy,offering more relevant content to users and driving greater customer satisfaction and business growth.
基金The author extends his appreciation to theDeputyship forResearch&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number(IFPSAU-2021/01/17758).
文摘Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.
基金the Deanship of Scientific Research at Umm Al-Qura University,Grant Code:(23UQU4361009DSR001).
文摘Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.
基金The National Natural Science Foundation of China(No50674086)Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)
文摘In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
基金supported in part by the National Natural Science Foundation of China under Grand No.61871129 and No.61301179Projects of Science and Technology Plan Guangdong Province under Grand No.2014A010101284
文摘For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the signal is extracted and optimized by using a clustering algorithm, support vector machine is trained by grading algorithm so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram in this paper. Simulation results show that the average recognition rate based on this algorithm is enhanced over 30% compared with methods that adopting clustering algorithm or support vector machine respectively under the low SNR. The average recognition rate can reach 90% when the SNR is 5 dB, and the method is easy to be achieved so that it has broad application prospect in the modulating recognition.
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.