Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflecti...Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflection point method”to analyze specific deviation trajectories,determine deviation thresholds,and identify commonly used deviation paths.By combining multiple similarity metrics,including Euclidean distance,Hausdorff distance,and sector edit distance,with the density-based spatial clustering of applications with noise(DBSCAN)algorithm,the study clusters deviation trajectories to construct a multi-option trajectory set for city pairs.A case study of 23578 flight trajectories between the Guangzhou airport cluster and the Shanghai airport cluster demonstrates the effectiveness of the proposed framework.Experimental results show that sector edit distance achieves superior clustering performance compared to Euclidean and Hausdorff distances,with higher silhouette coefficients and lower Davies⁃Bouldin indices,ensuring better intra-cluster compactness and inter-cluster separation.Based on clustering results,19 representative trajectory options are identified,covering both nominal and deviation paths,which significantly enhance route diversity and reflect actual flight practices.This provides a practical basis for optimizing flight paths and scheduling,enhancing the flexibility of route selection for flights between city pairs.展开更多
This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised ...This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.展开更多
At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for ident...At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timesc...Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.展开更多
Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-com...Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.展开更多
Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In ...Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In order to obtain a more credible degradation rate,this study proposes a novel method to classify the experimental data collected under different working conditions into similar operating conditions by using dimensionality reduction and clustering algorithms.Firstly,the experimental data collected from fuel cell vehicles belong to high-dimensional data.Then projecting high-dimensional data into three-dimensional feature vector space via principal component analysis(PCA).The dimension-reduced three-dimensional feature vectors are input into the clustering algorithm,such as K-means and density-based noise application spatial clustering(DBSCAN).According to the clustering results,the fuel cell voltage data with similar operating conditions can be classified.Finally,the selected voltage data can be used to precisely represent the true performance degradation of an on-board fuel cell stack.The results show that the voltage using the K-means algorithm declines the fastest,followed by the DBSCAN algorithm, finally the original data, which indicates that the performance of the fuel cell actually declines faste. Early intervention can prolong its life to the greatest extent.展开更多
Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Alth...Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Although several active semi-supervised fuzzy clustering methods have been developed previously,they typically face significant limitations,including high computational complexity,sensitivity to initial cluster centroids,and difficulties in accurately managing boundary clusters where data points often overlap among multiple clusters.This study introduces a novel Active Semi-Supervised Fuzzy Clustering algorithm specifically designed to identify,analyze,and correct misclassified boundary elements.By strategically utilizing labeled data through active learning,our method improves the robustness and precision of cluster boundary assignments.Extensive experimental evaluations conducted on three types of datasets—including benchmark UCI datasets,synthetic data with controlled boundary overlap,and satellite imagery—demonstrate that our proposed approach achieves superior performance in terms of clustering accuracy and robustness compared to existing active semi-supervised fuzzy clustering methods.The results confirm the effectiveness and practicality of our method in handling real-world scenarios where precise cluster boundaries are critical.展开更多
The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has under...The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has undergone rapid and remarkable development.However,there is no relevant literature summarizing these achievements,and this paper attempts to elaborate on the current status and prospects in this field.Specifically,this review aims to discuss the development process of clustering methods for discontinuity sets and the state-of-the-art relevant algorithms.First,we introduce the importance of discontinuity clustering analysis and follow the comprehensive characterization approaches of discontinuity data.A bibliometric analysis is subsequently conducted to clarify the current status and development characteristics of the clustering of discontinuity sets.The methods for the clustering analysis of rock discontinuities are reviewed in terms of single-and multi-parameter clustering methods.Single-parameter methods can be classified into empirical judgment methods,dynamic clustering methods,relative static clustering methods,and static clustering methods,reflecting the continuous optimization and improvement of clustering algorithms.Moreover,this paper compares the current mainstream of single-parameter clustering methods with multi-parameter clustering methods.It is emphasized that the current single-parameter clustering methods have reached their performance limits,with little room for improvement,and that there is a need to extend the study of multi-parameter clustering methods.Finally,several suggestions are offered for future research on the clustering of discontinuity sets.展开更多
Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the mul...Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.展开更多
Transient stability assessment(TSA)based on artificial intelligence typically has two distinct model management approaches:a unified management approach for all faulted lines and a separate management approach for eac...Transient stability assessment(TSA)based on artificial intelligence typically has two distinct model management approaches:a unified management approach for all faulted lines and a separate management approach for each faulted line.To address the shortcomings of the aforementioned approaches,namely accuracy,training time,and model management complexity,a multi-model management approach for power system TSA based on multi-moment feature clustering has been proposed.First,the steady-state and transient features present under fault conditions were obtained through a transient simulation of line faults.The input sample set was then constructed using the aforementioned multi-moment electrical features and the embedded faulty line numbers.Subsequently,K-means clustering was conducted on each line based on the similarity of their electrical features,employing t-SNE dimensionality reduction.The PSO-CNN model was trained separately for each cluster to generate several independent TSA models.Finally,a model effectiveness evaluation system consisting of five metrics was established,and the effect of the sample imbalance ratio on the model effectiveness was investigated.The model effectiveness was evaluated using the IEEE 39-bus system algorithm.The results showed that the multi-model management strategy based on multi-moment feature clustering can effectively combine the two advantages of superior evaluation performance and streamlined model management by fully extracting system features.Moreover,this approach allows for more flexible adjustments to line topology changes.展开更多
Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as ...Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as inaccurate node clustering,low energy efficiency,and shortened network lifespan in practical deployments,which significantly limit their large-scale application.To address these issues,this paper proposes an Adaptive Chaotic Ant Colony Optimization algorithm(AC-ACO),aiming to optimize the energy utilization and system lifespan of WSNs.AC-ACO combines the path-planning capability of Ant Colony Optimization(ACO)with the dynamic characteristics of chaotic mapping and introduces an adaptive mechanism to enhance the algorithm’s flexibility and adaptability.By dynamically adjusting the pheromone evaporation factor and heuristic weights,efficient node clustering is achieved.Additionally,a chaotic mapping initialization strategy is employed to enhance population diversity and avoid premature convergence.To validate the algorithm’s performance,this paper compares AC-ACO with clustering methods such as Low-Energy Adaptive Clustering Hierarchy(LEACH),ACO,Particle Swarm Optimization(PSO),and Genetic Algorithm(GA).Simulation results demonstrate that AC-ACO outperforms the compared algorithms in key metrics such as energy consumption optimization,network lifetime extension,and communication delay reduction,providing an efficient solution for improving energy efficiency and ensuring long-term stable operation of wireless sensor networks.展开更多
Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected ...Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected from rock outcrops.In response,we propose a workflow that balances accuracy and efficiency to extract discontinuities from massive point clouds.The proposed method employs voxel filtering to downsample point clouds,constructs a point cloud topology using K-d trees,utilizes principal component analysis to calculate the point cloud normals,and employs the pointwise clustering(PWC)algorithm to extract discontinuities from rock outcrop point clouds.This method provides information on the location and orientation(dip direction and dip angle)of the discontinuities,and the modified whale optimization algorithm(MWOA)is utilized to identify major discontinuity sets and their average orientations.Performance evaluations based on three real cases demonstrate that the proposed method significantly reduces computational time costs without sacrificing accuracy.In particular,the method yields more reasonable extraction results for discontinuities with certain undulations.The presented approach offers a novel tool for efficiently extracting discontinuities from large-scale point clouds.展开更多
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
This article proposes a novel stable clustering design method for hierarchical satellite network in order to increase its stability,reduce the overhead of storage and exert effective control of the delay performances ...This article proposes a novel stable clustering design method for hierarchical satellite network in order to increase its stability,reduce the overhead of storage and exert effective control of the delay performances based on a 5-dimensional vector model. According to the function of stability measureinent and owing to the limitation of minimal average routing table length, the hierarchical satellite network is grouped into separate stable connected clusters to improve destruction resistance and reconstruction ability in the future integrated network. In each cluster, redundant communication links with little contribution to network stability and slight influences on delay variation are deleted to satisfy the requirements for stability and connectivity by means of optimal link resources, and, also, the idea of logical weight is introduced to select the optimal satellites used to communicate with neighboring cluster satellites. Finally, the feasibility and effectiveness of the proposed method are verified by comparing it with the simulated performances of other two typical hierarchical satellite networks, double layer satellite constellation(DLSC) and satellite over satellite(SOS).展开更多
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities...Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.展开更多
Data clustering is a significant information retrieval technique in today's data intensive society. Over the last few decades a vast variety of huge number of data clustering algorithms have been designed and impleme...Data clustering is a significant information retrieval technique in today's data intensive society. Over the last few decades a vast variety of huge number of data clustering algorithms have been designed and implemented for all most all data types. The quality of results of cluster analysis mainly depends on the clustering algorithm used in the analysis. Architecture of a versatile, less user dependent, dynamic and scalable data clustering machine is presented. The machine selects for analysis, the best available data clustering algorithm on the basis of the credentials of the data and previously used domain knowledge. The domain knowledge is updated on completion of each session of data analysis.展开更多
基金supported in part by Boeing Company and Nanjing University of Aeronautics and Astronautics(NUAA)through the Research on Decision Support Technology of Air Traffic Operation Management in Convective Weather under Project 2022-GT-129in part by the Postgraduate Research and Practice Innovation Program of NUAA(No.xcxjh20240709)。
文摘Addressing the issue that flight plans between Chinese city pairs typically rely on a single route,lacking alternative paths and posing challenges in responding to emergencies,this study employs the“quantile-inflection point method”to analyze specific deviation trajectories,determine deviation thresholds,and identify commonly used deviation paths.By combining multiple similarity metrics,including Euclidean distance,Hausdorff distance,and sector edit distance,with the density-based spatial clustering of applications with noise(DBSCAN)algorithm,the study clusters deviation trajectories to construct a multi-option trajectory set for city pairs.A case study of 23578 flight trajectories between the Guangzhou airport cluster and the Shanghai airport cluster demonstrates the effectiveness of the proposed framework.Experimental results show that sector edit distance achieves superior clustering performance compared to Euclidean and Hausdorff distances,with higher silhouette coefficients and lower Davies⁃Bouldin indices,ensuring better intra-cluster compactness and inter-cluster separation.Based on clustering results,19 representative trajectory options are identified,covering both nominal and deviation paths,which significantly enhance route diversity and reflect actual flight practices.This provides a practical basis for optimizing flight paths and scheduling,enhancing the flexibility of route selection for flights between city pairs.
文摘This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.
基金funded by the State Grid Limited Science and Technology Project of China,Grant Number SGSXDK00DJJS2200144.
文摘At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
基金supported by Shenzhen Science and Technology Innovation Committee(JCYJ20170413173837121)the Hong Kong Research Grant Council(HKUST C6009-15G,14203915,16302214,16304215,16318816,and AoE/P-705/16)+2 种基金King Abdullah University of Science and Technology(KAUST) Office of Sponsored Research(OSR)(OSR-2016-CRG5-3007)Guangzhou Science Technology and Innovation Commission(201704030116)Innovation and Technology Commission(ITCPD/17-9and ITC-CNERC14SC01)
文摘Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.
基金Deanship of scientific research for funding and supporting this research through the initiative of DSR Graduate Students Research Support(GSR).
文摘Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.
基金supported by the special key project of Chongqing technological innovation and application development(cstc2019jscx-zdztzxX0033)the national key R&D plan of the Ministry of science and Technology(sub project)(2018YFB0105400)the National Natural Science Foundation of China(21908142).
文摘Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In order to obtain a more credible degradation rate,this study proposes a novel method to classify the experimental data collected under different working conditions into similar operating conditions by using dimensionality reduction and clustering algorithms.Firstly,the experimental data collected from fuel cell vehicles belong to high-dimensional data.Then projecting high-dimensional data into three-dimensional feature vector space via principal component analysis(PCA).The dimension-reduced three-dimensional feature vectors are input into the clustering algorithm,such as K-means and density-based noise application spatial clustering(DBSCAN).According to the clustering results,the fuel cell voltage data with similar operating conditions can be classified.Finally,the selected voltage data can be used to precisely represent the true performance degradation of an on-board fuel cell stack.The results show that the voltage using the K-means algorithm declines the fastest,followed by the DBSCAN algorithm, finally the original data, which indicates that the performance of the fuel cell actually declines faste. Early intervention can prolong its life to the greatest extent.
文摘Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Although several active semi-supervised fuzzy clustering methods have been developed previously,they typically face significant limitations,including high computational complexity,sensitivity to initial cluster centroids,and difficulties in accurately managing boundary clusters where data points often overlap among multiple clusters.This study introduces a novel Active Semi-Supervised Fuzzy Clustering algorithm specifically designed to identify,analyze,and correct misclassified boundary elements.By strategically utilizing labeled data through active learning,our method improves the robustness and precision of cluster boundary assignments.Extensive experimental evaluations conducted on three types of datasets—including benchmark UCI datasets,synthetic data with controlled boundary overlap,and satellite imagery—demonstrate that our proposed approach achieves superior performance in terms of clustering accuracy and robustness compared to existing active semi-supervised fuzzy clustering methods.The results confirm the effectiveness and practicality of our method in handling real-world scenarios where precise cluster boundaries are critical.
基金funding support from the National Natural Science Foundation of China(Grant No.42007269)the Young Talent Fund of Xi'an Association for Science and Technology(Grant No.959202313094)the Fundamental Research Funds for the Central Universities,CHD(Grant No.300102263401).
文摘The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has undergone rapid and remarkable development.However,there is no relevant literature summarizing these achievements,and this paper attempts to elaborate on the current status and prospects in this field.Specifically,this review aims to discuss the development process of clustering methods for discontinuity sets and the state-of-the-art relevant algorithms.First,we introduce the importance of discontinuity clustering analysis and follow the comprehensive characterization approaches of discontinuity data.A bibliometric analysis is subsequently conducted to clarify the current status and development characteristics of the clustering of discontinuity sets.The methods for the clustering analysis of rock discontinuities are reviewed in terms of single-and multi-parameter clustering methods.Single-parameter methods can be classified into empirical judgment methods,dynamic clustering methods,relative static clustering methods,and static clustering methods,reflecting the continuous optimization and improvement of clustering algorithms.Moreover,this paper compares the current mainstream of single-parameter clustering methods with multi-parameter clustering methods.It is emphasized that the current single-parameter clustering methods have reached their performance limits,with little room for improvement,and that there is a need to extend the study of multi-parameter clustering methods.Finally,several suggestions are offered for future research on the clustering of discontinuity sets.
基金Supported by the National Natural Science Foundation of China under Grant No. 60872018,60721002,60875038the National Basic Research 973 Program of China under Grant No. 2007CB310607+2 种基金SRFDP Project under Grant No. 20070293001the Science and Technology Support Foundation of Jiangsu Province under Grant No. BE2009142 and BE2010180the Scientific Research Foundation of Graduate School of Nanjing University under Grant No. 2011CL07
文摘Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.
基金supported by the Science and Technology Project of SGCC(5100-202199558A-0-5-ZN).
文摘Transient stability assessment(TSA)based on artificial intelligence typically has two distinct model management approaches:a unified management approach for all faulted lines and a separate management approach for each faulted line.To address the shortcomings of the aforementioned approaches,namely accuracy,training time,and model management complexity,a multi-model management approach for power system TSA based on multi-moment feature clustering has been proposed.First,the steady-state and transient features present under fault conditions were obtained through a transient simulation of line faults.The input sample set was then constructed using the aforementioned multi-moment electrical features and the embedded faulty line numbers.Subsequently,K-means clustering was conducted on each line based on the similarity of their electrical features,employing t-SNE dimensionality reduction.The PSO-CNN model was trained separately for each cluster to generate several independent TSA models.Finally,a model effectiveness evaluation system consisting of five metrics was established,and the effect of the sample imbalance ratio on the model effectiveness was investigated.The model effectiveness was evaluated using the IEEE 39-bus system algorithm.The results showed that the multi-model management strategy based on multi-moment feature clustering can effectively combine the two advantages of superior evaluation performance and streamlined model management by fully extracting system features.Moreover,this approach allows for more flexible adjustments to line topology changes.
基金funded by the Natural Science Foundation of Xinjiang Uygur Autonomous Region:No.22D01B148Bidding Topics for the Center for Integration of Education and Production and Development of New Business in 2024:No.2024-KYJD05+1 种基金Basic Scientific Research Business Fee Project of Colleges and Universities in Autonomous Region:No.XJEDU2025P126Xinjiang College of Science&Technology School-level Scientific Research Fund Project:No.2024-KYTD01.
文摘Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as inaccurate node clustering,low energy efficiency,and shortened network lifespan in practical deployments,which significantly limit their large-scale application.To address these issues,this paper proposes an Adaptive Chaotic Ant Colony Optimization algorithm(AC-ACO),aiming to optimize the energy utilization and system lifespan of WSNs.AC-ACO combines the path-planning capability of Ant Colony Optimization(ACO)with the dynamic characteristics of chaotic mapping and introduces an adaptive mechanism to enhance the algorithm’s flexibility and adaptability.By dynamically adjusting the pheromone evaporation factor and heuristic weights,efficient node clustering is achieved.Additionally,a chaotic mapping initialization strategy is employed to enhance population diversity and avoid premature convergence.To validate the algorithm’s performance,this paper compares AC-ACO with clustering methods such as Low-Energy Adaptive Clustering Hierarchy(LEACH),ACO,Particle Swarm Optimization(PSO),and Genetic Algorithm(GA).Simulation results demonstrate that AC-ACO outperforms the compared algorithms in key metrics such as energy consumption optimization,network lifetime extension,and communication delay reduction,providing an efficient solution for improving energy efficiency and ensuring long-term stable operation of wireless sensor networks.
基金supported by the National Natural Science Foundation of China(Grant No.42407232)the Sichuan Science and Technology Program(Grant No.2024NSFSC0826).
文摘Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected from rock outcrops.In response,we propose a workflow that balances accuracy and efficiency to extract discontinuities from massive point clouds.The proposed method employs voxel filtering to downsample point clouds,constructs a point cloud topology using K-d trees,utilizes principal component analysis to calculate the point cloud normals,and employs the pointwise clustering(PWC)algorithm to extract discontinuities from rock outcrop point clouds.This method provides information on the location and orientation(dip direction and dip angle)of the discontinuities,and the modified whale optimization algorithm(MWOA)is utilized to identify major discontinuity sets and their average orientations.Performance evaluations based on three real cases demonstrate that the proposed method significantly reduces computational time costs without sacrificing accuracy.In particular,the method yields more reasonable extraction results for discontinuities with certain undulations.The presented approach offers a novel tool for efficiently extracting discontinuities from large-scale point clouds.
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.
基金supported by National Natural Science Foundation of China(61304256)Zhejiang Provincial Natural Science Foundation of China(LQ13F030013)+4 种基金Project of the Education Department of Zhejiang Province(Y201327006)Young Researchers Foundation of Zhejiang Provincial Top Key Academic Discipline of Mechanical Engineering and Zhejiang Sci-Tech University Key Laboratory(ZSTUME01B15)New Century 151 Talent Project of Zhejiang Province521 Talent Project of Zhejiang Sci-Tech UniversityYoung and Middle-aged Talents Foundation of Zhejiang Provincial Top Key Academic Discipline of Mechanical Engineering
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
基金National Natural Science Foundation of China(60532030)
文摘This article proposes a novel stable clustering design method for hierarchical satellite network in order to increase its stability,reduce the overhead of storage and exert effective control of the delay performances based on a 5-dimensional vector model. According to the function of stability measureinent and owing to the limitation of minimal average routing table length, the hierarchical satellite network is grouped into separate stable connected clusters to improve destruction resistance and reconstruction ability in the future integrated network. In each cluster, redundant communication links with little contribution to network stability and slight influences on delay variation are deleted to satisfy the requirements for stability and connectivity by means of optimal link resources, and, also, the idea of logical weight is introduced to select the optimal satellites used to communicate with neighboring cluster satellites. Finally, the feasibility and effectiveness of the proposed method are verified by comparing it with the simulated performances of other two typical hierarchical satellite networks, double layer satellite constellation(DLSC) and satellite over satellite(SOS).
文摘Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.
文摘Data clustering is a significant information retrieval technique in today's data intensive society. Over the last few decades a vast variety of huge number of data clustering algorithms have been designed and implemented for all most all data types. The quality of results of cluster analysis mainly depends on the clustering algorithm used in the analysis. Architecture of a versatile, less user dependent, dynamic and scalable data clustering machine is presented. The machine selects for analysis, the best available data clustering algorithm on the basis of the credentials of the data and previously used domain knowledge. The domain knowledge is updated on completion of each session of data analysis.