This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised ...This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.展开更多
At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for ident...At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timesc...Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.展开更多
Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-com...Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.展开更多
The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has under...The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has undergone rapid and remarkable development.However,there is no relevant literature summarizing these achievements,and this paper attempts to elaborate on the current status and prospects in this field.Specifically,this review aims to discuss the development process of clustering methods for discontinuity sets and the state-of-the-art relevant algorithms.First,we introduce the importance of discontinuity clustering analysis and follow the comprehensive characterization approaches of discontinuity data.A bibliometric analysis is subsequently conducted to clarify the current status and development characteristics of the clustering of discontinuity sets.The methods for the clustering analysis of rock discontinuities are reviewed in terms of single-and multi-parameter clustering methods.Single-parameter methods can be classified into empirical judgment methods,dynamic clustering methods,relative static clustering methods,and static clustering methods,reflecting the continuous optimization and improvement of clustering algorithms.Moreover,this paper compares the current mainstream of single-parameter clustering methods with multi-parameter clustering methods.It is emphasized that the current single-parameter clustering methods have reached their performance limits,with little room for improvement,and that there is a need to extend the study of multi-parameter clustering methods.Finally,several suggestions are offered for future research on the clustering of discontinuity sets.展开更多
Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as ...Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as inaccurate node clustering,low energy efficiency,and shortened network lifespan in practical deployments,which significantly limit their large-scale application.To address these issues,this paper proposes an Adaptive Chaotic Ant Colony Optimization algorithm(AC-ACO),aiming to optimize the energy utilization and system lifespan of WSNs.AC-ACO combines the path-planning capability of Ant Colony Optimization(ACO)with the dynamic characteristics of chaotic mapping and introduces an adaptive mechanism to enhance the algorithm’s flexibility and adaptability.By dynamically adjusting the pheromone evaporation factor and heuristic weights,efficient node clustering is achieved.Additionally,a chaotic mapping initialization strategy is employed to enhance population diversity and avoid premature convergence.To validate the algorithm’s performance,this paper compares AC-ACO with clustering methods such as Low-Energy Adaptive Clustering Hierarchy(LEACH),ACO,Particle Swarm Optimization(PSO),and Genetic Algorithm(GA).Simulation results demonstrate that AC-ACO outperforms the compared algorithms in key metrics such as energy consumption optimization,network lifetime extension,and communication delay reduction,providing an efficient solution for improving energy efficiency and ensuring long-term stable operation of wireless sensor networks.展开更多
Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected ...Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected from rock outcrops.In response,we propose a workflow that balances accuracy and efficiency to extract discontinuities from massive point clouds.The proposed method employs voxel filtering to downsample point clouds,constructs a point cloud topology using K-d trees,utilizes principal component analysis to calculate the point cloud normals,and employs the pointwise clustering(PWC)algorithm to extract discontinuities from rock outcrop point clouds.This method provides information on the location and orientation(dip direction and dip angle)of the discontinuities,and the modified whale optimization algorithm(MWOA)is utilized to identify major discontinuity sets and their average orientations.Performance evaluations based on three real cases demonstrate that the proposed method significantly reduces computational time costs without sacrificing accuracy.In particular,the method yields more reasonable extraction results for discontinuities with certain undulations.The presented approach offers a novel tool for efficiently extracting discontinuities from large-scale point clouds.展开更多
Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In ...Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In order to obtain a more credible degradation rate,this study proposes a novel method to classify the experimental data collected under different working conditions into similar operating conditions by using dimensionality reduction and clustering algorithms.Firstly,the experimental data collected from fuel cell vehicles belong to high-dimensional data.Then projecting high-dimensional data into three-dimensional feature vector space via principal component analysis(PCA).The dimension-reduced three-dimensional feature vectors are input into the clustering algorithm,such as K-means and density-based noise application spatial clustering(DBSCAN).According to the clustering results,the fuel cell voltage data with similar operating conditions can be classified.Finally,the selected voltage data can be used to precisely represent the true performance degradation of an on-board fuel cell stack.The results show that the voltage using the K-means algorithm declines the fastest,followed by the DBSCAN algorithm, finally the original data, which indicates that the performance of the fuel cell actually declines faste. Early intervention can prolong its life to the greatest extent.展开更多
Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the mul...Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.展开更多
To address the poor performance of commonly used intelligent optimization algorithms in solving location problems—specifically regarding effectiveness,efficiency,and stability—this study proposes a novel location al...To address the poor performance of commonly used intelligent optimization algorithms in solving location problems—specifically regarding effectiveness,efficiency,and stability—this study proposes a novel location allocation method for the delivery sites to deliver daily necessities during epidemic quarantines.After establishing the optimization objectives and constraints,we developed a relevant mathematical model based on the collected data and utilized traditional intelligent optimization algorithms to obtain Pareto optimal solutions.Building on the characteristics of these Pareto front solutions,we introduced an improved clustering algorithm and conducted simulation experiments using data from Changchun City.The results demonstrate that the proposed algorithm outperforms traditional intelligent optimization algorithms in terms of effectiveness,efficiency,and stability,achieving reductions of approximately 12%and 8%in time and labor costs,respectively,compared to the baseline algorithm.展开更多
Failure mode and effect analysis(FMEA)is a preven-tative risk evaluation method used to evaluate and eliminate fail-ure modes within a system.However,the traditional FMEA method exhibits many deficiencies that pose ch...Failure mode and effect analysis(FMEA)is a preven-tative risk evaluation method used to evaluate and eliminate fail-ure modes within a system.However,the traditional FMEA method exhibits many deficiencies that pose challenges in prac-tical applications.To improve the conventional FMEA,many modified FMEA models have been suggested.However,the majority of them inadequately address consensus issues and focus on achieving a complete ranking of failure modes.In this research,we propose a new FMEA approach that integrates a two-stage consensus reaching model and a density peak clus-tering algorithm for the assessment and clustering of failure modes.Firstly,we employ the interval 2-tuple linguistic vari-ables(I2TLVs)to express the uncertain risk evaluations provided by FMEA experts.Then,a two-stage consensus reaching model is adopted to enable FMEA experts to reach a consensus.Next,failure modes are categorized into several risk clusters using a density peak clustering algorithm.Finally,the proposed FMEA is illustrated by a case study of load-bearing guidance devices of subway systems.The results show that the proposed FMEA model can more easily to describe the uncertain risk information of failure modes by using the I2TLVs;the introduction of an endogenous feedback mechanism and an exogenous feedback mechanism can accelerate the process of consensus reaching;and the density peak clustering of failure modes successfully improves the practical applicability of FMEA.展开更多
With the rapid advancement of the Internet,network attack methods are constantly evolving and adapting.To better identify the network attack behavior,a universal gravitation clustering algorithm was proposed by analyz...With the rapid advancement of the Internet,network attack methods are constantly evolving and adapting.To better identify the network attack behavior,a universal gravitation clustering algorithm was proposed by analyzing the dissimilarities and similarities of the clustering algorithms.First,the algorithm designated the cluster set as vacant,with the introduction of a new object.Subsequently,a new cluster based on the given object was constructed.The dissimilarities between it and each existing cluster were calculated using a defined difference measure.Theminimumdissimilaritywas selected.Through comparing the proposed algorithmwith the traditional Back Propagation(BP)neural network and nearest neighbor detection algorithm,the application of the Defense Advanced Research Projects Agency(DARPA)00 and Knowledge Discovery and Data Mining(KDD)Cup 99 datasets revealed that the performance of the proposed algorithmsurpassed that of both algorithms in terms of the detection rate,speed,false positive rate,and false negative rate.展开更多
With the increasing integration of emerging source-load types such as distributed photovoltaics,electric vehicles,and energy storage into distribution networks,the operational characteristics of these networks have ev...With the increasing integration of emerging source-load types such as distributed photovoltaics,electric vehicles,and energy storage into distribution networks,the operational characteristics of these networks have evolved from traditional single-load centers to complex multi-source,multi-load systems.This transition not only increases the difficulty of effectively classifying distribution networks due to their heightened complexity but also renders traditional energy management approaches-primarily focused on economic objectives-insufficient to meet the growing demands for flexible scheduling and dynamic response.To address these challenges,this paper proposes an adaptive multi-objective energy management strategy that accounts for the distinct operational requirements of distribution networks with a high penetration of new-type source-loads.The goal is to establish a comprehensive energy management framework that optimally balances energy efficiency,carbon reduction,and economic performance in modern distribution networks.To enhance classification accuracy,the strategy constructs amulti-dimensional scenario classification model that integrates environmental and climatic factors by analyzing the operational characteristics of new-type distribution networks and incorporating expert knowledge.An improved split-coupling K-means preclustering algorithm is employed to classify distribution networks effectively.Based on the classification results,fuzzy logic control is then utilized to dynamically optimize the weighting of each objective,allowing for an adaptive adjustment of priorities to achieve a flexible and responsivemulti-objective energy management strategy.The effectiveness of the proposed approach is validated through practical case studies.Simulation results indicate that the proposed method improves classification accuracy by 18.18%compared to traditional classification methods and enhances energy savings and carbon reduction by 4.34%and 20.94%,respectively,compared to the fixed-weight strategy.展开更多
Three-dimensional ocean subsurface temperature and salinity structures(OST/OSS)in the South China Sea(SCS)play crucial roles in oceanic climate research and disaster mitigation.Traditionally,real-time OST and OSS are ...Three-dimensional ocean subsurface temperature and salinity structures(OST/OSS)in the South China Sea(SCS)play crucial roles in oceanic climate research and disaster mitigation.Traditionally,real-time OST and OSS are mainly obtained through in-situ ocean observations and simulation by ocean circulation models,which are usually challenging and costly.Recently,dynamical,statistical,or machine learning models have been proposed to invert the OST/OSS from sea surface information;however,these models mainly focused on the inversion of monthly OST and OSS.To address this issue,we apply clustering algorithms and employ a stacking strategy to ensemble three models(XGBoost,Random Forest,and LightGBM)to invert the real-time OST/OSS based on satellite-derived data and the Argo dataset.Subsequently,a fusion of temperature and salinity is employed to reconstruct OST and OSS.In the validation dataset,the depth-averaged Correlation(Corr)of the estimated OST(OSS)is 0.919(0.83),and the average Root-Mean-Square Error(RMSE)is0.639°C(0.087 psu),with a depth-averaged coefficient of determination(R~2)of 0.84(0.68).Notably,at the thermocline where the base models exhibit their maximum error,the stacking-based fusion model exhibited significant performance enhancement,with a maximum enhancement in OST and OSS inversion exceeding 10%.We further found that the estimated OST and OSS exhibit good agreement with the HYbrid Coordinate Ocean Model(HYCOM)data and BOA_Argo dataset during the passage of a mesoscale eddy.This study shows that the proposed model can effectively invert the real-time OST and OSS,potentially enhancing the understanding of multi-scale oceanic processes in the SCS.展开更多
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
This article proposes a novel stable clustering design method for hierarchical satellite network in order to increase its stability,reduce the overhead of storage and exert effective control of the delay performances ...This article proposes a novel stable clustering design method for hierarchical satellite network in order to increase its stability,reduce the overhead of storage and exert effective control of the delay performances based on a 5-dimensional vector model. According to the function of stability measureinent and owing to the limitation of minimal average routing table length, the hierarchical satellite network is grouped into separate stable connected clusters to improve destruction resistance and reconstruction ability in the future integrated network. In each cluster, redundant communication links with little contribution to network stability and slight influences on delay variation are deleted to satisfy the requirements for stability and connectivity by means of optimal link resources, and, also, the idea of logical weight is introduced to select the optimal satellites used to communicate with neighboring cluster satellites. Finally, the feasibility and effectiveness of the proposed method are verified by comparing it with the simulated performances of other two typical hierarchical satellite networks, double layer satellite constellation(DLSC) and satellite over satellite(SOS).展开更多
文摘This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.
基金funded by the State Grid Limited Science and Technology Project of China,Grant Number SGSXDK00DJJS2200144.
文摘At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
基金supported by Shenzhen Science and Technology Innovation Committee(JCYJ20170413173837121)the Hong Kong Research Grant Council(HKUST C6009-15G,14203915,16302214,16304215,16318816,and AoE/P-705/16)+2 种基金King Abdullah University of Science and Technology(KAUST) Office of Sponsored Research(OSR)(OSR-2016-CRG5-3007)Guangzhou Science Technology and Innovation Commission(201704030116)Innovation and Technology Commission(ITCPD/17-9and ITC-CNERC14SC01)
文摘Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.
基金Deanship of scientific research for funding and supporting this research through the initiative of DSR Graduate Students Research Support(GSR).
文摘Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.
基金funding support from the National Natural Science Foundation of China(Grant No.42007269)the Young Talent Fund of Xi'an Association for Science and Technology(Grant No.959202313094)the Fundamental Research Funds for the Central Universities,CHD(Grant No.300102263401).
文摘The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has undergone rapid and remarkable development.However,there is no relevant literature summarizing these achievements,and this paper attempts to elaborate on the current status and prospects in this field.Specifically,this review aims to discuss the development process of clustering methods for discontinuity sets and the state-of-the-art relevant algorithms.First,we introduce the importance of discontinuity clustering analysis and follow the comprehensive characterization approaches of discontinuity data.A bibliometric analysis is subsequently conducted to clarify the current status and development characteristics of the clustering of discontinuity sets.The methods for the clustering analysis of rock discontinuities are reviewed in terms of single-and multi-parameter clustering methods.Single-parameter methods can be classified into empirical judgment methods,dynamic clustering methods,relative static clustering methods,and static clustering methods,reflecting the continuous optimization and improvement of clustering algorithms.Moreover,this paper compares the current mainstream of single-parameter clustering methods with multi-parameter clustering methods.It is emphasized that the current single-parameter clustering methods have reached their performance limits,with little room for improvement,and that there is a need to extend the study of multi-parameter clustering methods.Finally,several suggestions are offered for future research on the clustering of discontinuity sets.
基金funded by the Natural Science Foundation of Xinjiang Uygur Autonomous Region:No.22D01B148Bidding Topics for the Center for Integration of Education and Production and Development of New Business in 2024:No.2024-KYJD05+1 种基金Basic Scientific Research Business Fee Project of Colleges and Universities in Autonomous Region:No.XJEDU2025P126Xinjiang College of Science&Technology School-level Scientific Research Fund Project:No.2024-KYTD01.
文摘Wireless Sensor Networks(WSNs),as a crucial component of the Internet of Things(IoT),are widely used in environmental monitoring,industrial control,and security surveillance.However,WSNs still face challenges such as inaccurate node clustering,low energy efficiency,and shortened network lifespan in practical deployments,which significantly limit their large-scale application.To address these issues,this paper proposes an Adaptive Chaotic Ant Colony Optimization algorithm(AC-ACO),aiming to optimize the energy utilization and system lifespan of WSNs.AC-ACO combines the path-planning capability of Ant Colony Optimization(ACO)with the dynamic characteristics of chaotic mapping and introduces an adaptive mechanism to enhance the algorithm’s flexibility and adaptability.By dynamically adjusting the pheromone evaporation factor and heuristic weights,efficient node clustering is achieved.Additionally,a chaotic mapping initialization strategy is employed to enhance population diversity and avoid premature convergence.To validate the algorithm’s performance,this paper compares AC-ACO with clustering methods such as Low-Energy Adaptive Clustering Hierarchy(LEACH),ACO,Particle Swarm Optimization(PSO),and Genetic Algorithm(GA).Simulation results demonstrate that AC-ACO outperforms the compared algorithms in key metrics such as energy consumption optimization,network lifetime extension,and communication delay reduction,providing an efficient solution for improving energy efficiency and ensuring long-term stable operation of wireless sensor networks.
基金supported by the National Natural Science Foundation of China(Grant No.42407232)the Sichuan Science and Technology Program(Grant No.2024NSFSC0826).
文摘Recognizing discontinuities within rock masses is a critical aspect of rock engineering.The development of remote sensing technologies has significantly enhanced the quality and quantity of the point clouds collected from rock outcrops.In response,we propose a workflow that balances accuracy and efficiency to extract discontinuities from massive point clouds.The proposed method employs voxel filtering to downsample point clouds,constructs a point cloud topology using K-d trees,utilizes principal component analysis to calculate the point cloud normals,and employs the pointwise clustering(PWC)algorithm to extract discontinuities from rock outcrop point clouds.This method provides information on the location and orientation(dip direction and dip angle)of the discontinuities,and the modified whale optimization algorithm(MWOA)is utilized to identify major discontinuity sets and their average orientations.Performance evaluations based on three real cases demonstrate that the proposed method significantly reduces computational time costs without sacrificing accuracy.In particular,the method yields more reasonable extraction results for discontinuities with certain undulations.The presented approach offers a novel tool for efficiently extracting discontinuities from large-scale point clouds.
基金supported by the special key project of Chongqing technological innovation and application development(cstc2019jscx-zdztzxX0033)the national key R&D plan of the Ministry of science and Technology(sub project)(2018YFB0105400)the National Natural Science Foundation of China(21908142).
文摘Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In order to obtain a more credible degradation rate,this study proposes a novel method to classify the experimental data collected under different working conditions into similar operating conditions by using dimensionality reduction and clustering algorithms.Firstly,the experimental data collected from fuel cell vehicles belong to high-dimensional data.Then projecting high-dimensional data into three-dimensional feature vector space via principal component analysis(PCA).The dimension-reduced three-dimensional feature vectors are input into the clustering algorithm,such as K-means and density-based noise application spatial clustering(DBSCAN).According to the clustering results,the fuel cell voltage data with similar operating conditions can be classified.Finally,the selected voltage data can be used to precisely represent the true performance degradation of an on-board fuel cell stack.The results show that the voltage using the K-means algorithm declines the fastest,followed by the DBSCAN algorithm, finally the original data, which indicates that the performance of the fuel cell actually declines faste. Early intervention can prolong its life to the greatest extent.
基金Supported by the National Natural Science Foundation of China under Grant No. 60872018,60721002,60875038the National Basic Research 973 Program of China under Grant No. 2007CB310607+2 种基金SRFDP Project under Grant No. 20070293001the Science and Technology Support Foundation of Jiangsu Province under Grant No. BE2009142 and BE2010180the Scientific Research Foundation of Graduate School of Nanjing University under Grant No. 2011CL07
文摘Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.
基金National Natural Science Foundation of China(62202477)。
文摘To address the poor performance of commonly used intelligent optimization algorithms in solving location problems—specifically regarding effectiveness,efficiency,and stability—this study proposes a novel location allocation method for the delivery sites to deliver daily necessities during epidemic quarantines.After establishing the optimization objectives and constraints,we developed a relevant mathematical model based on the collected data and utilized traditional intelligent optimization algorithms to obtain Pareto optimal solutions.Building on the characteristics of these Pareto front solutions,we introduced an improved clustering algorithm and conducted simulation experiments using data from Changchun City.The results demonstrate that the proposed algorithm outperforms traditional intelligent optimization algorithms in terms of effectiveness,efficiency,and stability,achieving reductions of approximately 12%and 8%in time and labor costs,respectively,compared to the baseline algorithm.
基金supported by the Fundamental Research Funds for the Central Universities(22120240094)Humanities and Social Science Fund of Ministry of Education China(22YJA630082).
文摘Failure mode and effect analysis(FMEA)is a preven-tative risk evaluation method used to evaluate and eliminate fail-ure modes within a system.However,the traditional FMEA method exhibits many deficiencies that pose challenges in prac-tical applications.To improve the conventional FMEA,many modified FMEA models have been suggested.However,the majority of them inadequately address consensus issues and focus on achieving a complete ranking of failure modes.In this research,we propose a new FMEA approach that integrates a two-stage consensus reaching model and a density peak clus-tering algorithm for the assessment and clustering of failure modes.Firstly,we employ the interval 2-tuple linguistic vari-ables(I2TLVs)to express the uncertain risk evaluations provided by FMEA experts.Then,a two-stage consensus reaching model is adopted to enable FMEA experts to reach a consensus.Next,failure modes are categorized into several risk clusters using a density peak clustering algorithm.Finally,the proposed FMEA is illustrated by a case study of load-bearing guidance devices of subway systems.The results show that the proposed FMEA model can more easily to describe the uncertain risk information of failure modes by using the I2TLVs;the introduction of an endogenous feedback mechanism and an exogenous feedback mechanism can accelerate the process of consensus reaching;and the density peak clustering of failure modes successfully improves the practical applicability of FMEA.
基金supported by the Fujian China University Education Informatization Project(FJGX2023013)National Natural Science Foundation of China Youth Program(72001126)+1 种基金Sanming University’s Research and Optimization of the Function of Safety Test Management and Control Platform Project(KH22097)Young and Middle-Aged Teacher Education Research Project of Fujian Provincial Department of Education(JAT200642,B202033).
文摘With the rapid advancement of the Internet,network attack methods are constantly evolving and adapting.To better identify the network attack behavior,a universal gravitation clustering algorithm was proposed by analyzing the dissimilarities and similarities of the clustering algorithms.First,the algorithm designated the cluster set as vacant,with the introduction of a new object.Subsequently,a new cluster based on the given object was constructed.The dissimilarities between it and each existing cluster were calculated using a defined difference measure.Theminimumdissimilaritywas selected.Through comparing the proposed algorithmwith the traditional Back Propagation(BP)neural network and nearest neighbor detection algorithm,the application of the Defense Advanced Research Projects Agency(DARPA)00 and Knowledge Discovery and Data Mining(KDD)Cup 99 datasets revealed that the performance of the proposed algorithmsurpassed that of both algorithms in terms of the detection rate,speed,false positive rate,and false negative rate.
基金supported by the Science and Technology Project of the Headquarters of the State Grid Corporation(project code:5400-202323233A-1-1-ZN).
文摘With the increasing integration of emerging source-load types such as distributed photovoltaics,electric vehicles,and energy storage into distribution networks,the operational characteristics of these networks have evolved from traditional single-load centers to complex multi-source,multi-load systems.This transition not only increases the difficulty of effectively classifying distribution networks due to their heightened complexity but also renders traditional energy management approaches-primarily focused on economic objectives-insufficient to meet the growing demands for flexible scheduling and dynamic response.To address these challenges,this paper proposes an adaptive multi-objective energy management strategy that accounts for the distinct operational requirements of distribution networks with a high penetration of new-type source-loads.The goal is to establish a comprehensive energy management framework that optimally balances energy efficiency,carbon reduction,and economic performance in modern distribution networks.To enhance classification accuracy,the strategy constructs amulti-dimensional scenario classification model that integrates environmental and climatic factors by analyzing the operational characteristics of new-type distribution networks and incorporating expert knowledge.An improved split-coupling K-means preclustering algorithm is employed to classify distribution networks effectively.Based on the classification results,fuzzy logic control is then utilized to dynamically optimize the weighting of each objective,allowing for an adaptive adjustment of priorities to achieve a flexible and responsivemulti-objective energy management strategy.The effectiveness of the proposed approach is validated through practical case studies.Simulation results indicate that the proposed method improves classification accuracy by 18.18%compared to traditional classification methods and enhances energy savings and carbon reduction by 4.34%and 20.94%,respectively,compared to the fixed-weight strategy.
基金jointly supported by the National Key Research and Development Program of China(2022YFC3104304)the National Natural Science Foundation of China(Grant No.41876011)+1 种基金the 2022 Research Program of Sanya Yazhou Bay Science and Technology City(SKJC-2022-01-001)the Hainan Province Science and Technology Special Fund(ZDYF2021SHFZ265)。
文摘Three-dimensional ocean subsurface temperature and salinity structures(OST/OSS)in the South China Sea(SCS)play crucial roles in oceanic climate research and disaster mitigation.Traditionally,real-time OST and OSS are mainly obtained through in-situ ocean observations and simulation by ocean circulation models,which are usually challenging and costly.Recently,dynamical,statistical,or machine learning models have been proposed to invert the OST/OSS from sea surface information;however,these models mainly focused on the inversion of monthly OST and OSS.To address this issue,we apply clustering algorithms and employ a stacking strategy to ensemble three models(XGBoost,Random Forest,and LightGBM)to invert the real-time OST/OSS based on satellite-derived data and the Argo dataset.Subsequently,a fusion of temperature and salinity is employed to reconstruct OST and OSS.In the validation dataset,the depth-averaged Correlation(Corr)of the estimated OST(OSS)is 0.919(0.83),and the average Root-Mean-Square Error(RMSE)is0.639°C(0.087 psu),with a depth-averaged coefficient of determination(R~2)of 0.84(0.68).Notably,at the thermocline where the base models exhibit their maximum error,the stacking-based fusion model exhibited significant performance enhancement,with a maximum enhancement in OST and OSS inversion exceeding 10%.We further found that the estimated OST and OSS exhibit good agreement with the HYbrid Coordinate Ocean Model(HYCOM)data and BOA_Argo dataset during the passage of a mesoscale eddy.This study shows that the proposed model can effectively invert the real-time OST and OSS,potentially enhancing the understanding of multi-scale oceanic processes in the SCS.
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.
基金supported by National Natural Science Foundation of China(61304256)Zhejiang Provincial Natural Science Foundation of China(LQ13F030013)+4 种基金Project of the Education Department of Zhejiang Province(Y201327006)Young Researchers Foundation of Zhejiang Provincial Top Key Academic Discipline of Mechanical Engineering and Zhejiang Sci-Tech University Key Laboratory(ZSTUME01B15)New Century 151 Talent Project of Zhejiang Province521 Talent Project of Zhejiang Sci-Tech UniversityYoung and Middle-aged Talents Foundation of Zhejiang Provincial Top Key Academic Discipline of Mechanical Engineering
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
基金National Natural Science Foundation of China(60532030)
文摘This article proposes a novel stable clustering design method for hierarchical satellite network in order to increase its stability,reduce the overhead of storage and exert effective control of the delay performances based on a 5-dimensional vector model. According to the function of stability measureinent and owing to the limitation of minimal average routing table length, the hierarchical satellite network is grouped into separate stable connected clusters to improve destruction resistance and reconstruction ability in the future integrated network. In each cluster, redundant communication links with little contribution to network stability and slight influences on delay variation are deleted to satisfy the requirements for stability and connectivity by means of optimal link resources, and, also, the idea of logical weight is introduced to select the optimal satellites used to communicate with neighboring cluster satellites. Finally, the feasibility and effectiveness of the proposed method are verified by comparing it with the simulated performances of other two typical hierarchical satellite networks, double layer satellite constellation(DLSC) and satellite over satellite(SOS).