In this study, we used the multi-resolution graph-based clustering (MRGC) method for determining the electrofacies (EF) and lithofacies (LF) from well log data obtained from the intraplatform bank gas fields loc...In this study, we used the multi-resolution graph-based clustering (MRGC) method for determining the electrofacies (EF) and lithofacies (LF) from well log data obtained from the intraplatform bank gas fields located in the Amu Darya Basin. The MRGC could automatically determine the optimal number of clusters without prior knowledge about the structure or cluster numbers of the analyzed data set and allowed the users to control the level of detail actually needed to define the EF. Based on the LF identification and successful EF calibration using core data, an MRGC EF partition model including five clusters and a quantitative LF interpretation chart were constructed. The EF clusters 1 to 5 were interpreted as lagoon, anhydrite flat, interbank, low-energy bank, and high-energy bank, and the coincidence rate in the cored interval could reach 85%. We concluded that the MRGC could be accurately applied to predict the LF in non-cored but logged wells. Therefore, continuous EF clusters were partitioned and corresponding LF were characteristics &different LF were analyzed interpreted, and the distribution and petrophysical in the framework of sequence stratigraphy.展开更多
As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have...As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have become vital for identifying patterns and classifying celestial objects.This paper systematically investigates the application of five main categories of clustering techniques-partition-based,density-based,model-based,hierarchical,and“others”-across a range of astronomical research over the past decade.This review focuses on the six key application areas of stellar classification,galaxy structure analysis,detection of galactic and interstellar features,highenergy astrophysics,exoplanet studies,and anomaly detection.This paper provides an in-depth analysis of the performance and results of each method,considering their respective suitabilities for different data types.Additionally,it presents clustering algorithm selection strategies based on the characteristics of the spectroscopic data being analyzed.We highlight challenges such as handling large datasets,the need for more efficient computational tools,and the lack of labeled data.We also underscore the potential of unsupervised and semi-supervised clustering approaches to overcome these challenges,offering insight into their practical applications,performance,and results in astronomical research.展开更多
Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the constru...Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the construction and optimization of a prediction model for deformation prediction,a multipoint ultrahigh arch dam deformation prediction model,namely,the CEEMDAN-KPCA-GSWOA-KELM,which is based on a clustering partition,is pro-posed.First,the monitoring data are preprocessed via variational mode decomposition(VMD)and wavelet denoising(WT),which effectively filters out noise and improves the signal-to-noise ratio of the data,providing high-quality input data for subsequent prediction models.Second,scientific cluster partitioning is performed via the K-means++algorithm to precisely capture the spatial distribution characteristics of extra-high arch dams and ensure the consistency of deformation trends at measurement points within each partition.Finally,CEEMDAN is used to separate monitoring data,predict and analyze each component,combine the KPCA(Kernel Principal Component Analysis)and the KELM(Kernel Extreme Learning Machine)optimized by the GSWOA(Global Search Whale Optimization Algorithm),integrate the predictions of each component via reconstruction methods,and precisely predict the overall trend of ultrahigh arch dam deformation.An extra high arch dam project is taken as an example and validated via a comparative analysis of multiple models.The results show that the multipoint deformation prediction model in this paper can combine data from different measurement points,achieve a comprehensive,precise prediction of the deformation situation of extra high arch dams,and provide strong technical support for safe operation.展开更多
AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 to...AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 total deviation values(TDVs)from the first 10 VF tests of the training dataset,VF points were clustered into several regions using the hierarchical ordered partitioning and collapsing hybrid(HOPACH)and K-means clustering.Based on the clustering results,a linear regression analysis was applied to each clustered region of the testing dataset to predict the TDVs of the 10th VF test.Three to nine VF tests were used to predict the 10th VF test,and the prediction errors(root mean square error,RMSE)of each clustering method and pointwise linear regression(PLR)were compared.RESULTS:The training group consisted of 228 patients(mean age,54.20±14.38y;123 males and 105 females),and the testing group included 81 patients(mean age,54.88±15.22y;43 males and 38 females).All subjects were diagnosed with POAG.Fifty-two VF points were clustered into 11 and nine regions using HOPACH and K-means clustering,respectively.K-means clustering had a lower prediction error than PLR when n=1:3 and 1:4(both P≤0.003).The prediction errors of K-means clustering were lower than those of HOPACH in all sections(n=1:4 to 1:9;all P≤0.011),except for n=1:3(P=0.680).PLR outperformed K-means clustering only when n=1:8 and 1:9(both P≤0.020).CONCLUSION:K-means clustering can predict longterm VF test results more accurately in patients with POAG with limited VF data.展开更多
Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy cl...Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.展开更多
Federated learning is a machine learning framework designed to protect privacy by keeping training data on clients’devices without sharing private data.It trains a global model through collaboration between clients a...Federated learning is a machine learning framework designed to protect privacy by keeping training data on clients’devices without sharing private data.It trains a global model through collaboration between clients and the server.However,the presence of data heterogeneity can lead to inefficient model training and even reduce the final model’s accuracy and generalization capability.Meanwhile,data scarcity can result in suboptimal cluster distributions for few-shot clients in centralized clustering tasks,and standalone personalization tasks may cause severe overfitting issues.To address these limitations,we introduce a federated learning dual optimization model based on clustering and personalization strategy(FedCPS).FedCPS adopts a decentralized approach,where clients identify their cluster membership locally without relying on a centralized clustering algorithm.Building on this,FedCPS introduces personalized training tasks locally,adding a regularization term to control deviations between local and cluster models.This improves the generalization ability of the final model while mitigating overfitting.The use of weight-sharing techniques also reduces the computational cost of central machines.Experimental results on MNIST,FMNIST,CIFAR10,and CIFAR100 datasets demonstrate that our method achieves better personalization effects compared to other personalized federated learning methods,with an average test accuracy improvement of 0.81%–2.96%.Meanwhile,we adjusted the proportion of few-shot clients to evaluate the impact on accuracy across different methods.The experiments show that FedCPS reduces accuracy by only 0.2%–3.7%,compared to 2.1%–10%for existing methods.Our method demonstrates its advantages across diverse data environments.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
In machine vision,elliptical targets frequently appear within the camera's region of interest(ROI).Ellipse detection is essential for shape detection and geometric measurements in machine vision.However,existing e...In machine vision,elliptical targets frequently appear within the camera's region of interest(ROI).Ellipse detection is essential for shape detection and geometric measurements in machine vision.However,existing ellipse detection algorithms often face issues such as high computational complexity,strong dependence on initial conditions,sensitivity to noise,and lack of robustness to occlusions.In this paper,we propose a fast and robust ellipse detection method to address these challenges.This method first utilizes edge gradient and curvature information to segment the curve into circular arcs.Next,based on the convexity of the arcs,it divides them into different quadrants of the ellipse,groups and fits the arcs according to multiple geometric constraints at a low computational cost.Finally,it reduces the parameter space for hierarchical clustering and then segments the complete ellipse into several sectors for verification.We compare our method across seven datasets,including five public image datasets and two from industrial camera scenes.Experimental results show that our method achieves a precision ranging from 67.1%to 98.9%,a recall ranging from 48.1%to 92.9%,and an F-measure ranging from 58.0%to 95.8%.The average execution time per image ranges from 25 ms to 192 ms,demonstrating both high accuracy and efficiency.展开更多
For multi-vehicle networks,Cooperative Positioning(CP)technique has become a promising way to enhance vehicle positioning accuracy.Especially,the CP performance could be further improved by introducing Sensor-Rich Veh...For multi-vehicle networks,Cooperative Positioning(CP)technique has become a promising way to enhance vehicle positioning accuracy.Especially,the CP performance could be further improved by introducing Sensor-Rich Vehicles(SRVs)into CP networks,which is called SRV-aided CP.However,the CP system may split into several sub-clusters that cannot be connected with each other in dense urban environments,in which the sub-clusters with few SRVs will suffer from degradation of CP performance.Since Unmanned Aerial Vehicles(UAVs)have been widely used to aid vehicular communications,we intend to utilize UAVs to assist sub-clusters in CP.In this paper,a UAV-aided CP network is constructed to fully utilize information from SRVs.First,the inter-node connection structure among the UAV and vehicles is designed to share available information from SRVs.After that,the clustering optimization strategy is proposed,in which the UAV cooperates with the high-precision sub-cluster to obtain available information from SRVs,and then broadcasts this positioning-related information to other low-precision sub-clusters.Finally,the Locally-Centralized Factor Graph Optimization(LC-FGO)algorithm is designed to fuse positioning information from cooperators.Simulation results indicate that the positioning accuracy of the CP system could be improved by fully utilizing positioning-related information from SRVs.展开更多
Customer segmentation according to load-shape profiles using smart meter data is an increasingly important application to vital the planning and operation of energy systems and to enable citizens’participation in the...Customer segmentation according to load-shape profiles using smart meter data is an increasingly important application to vital the planning and operation of energy systems and to enable citizens’participation in the energy transition.This study proposes an innovative multi-step clustering procedure to segment customers based on load-shape patterns at the daily and intra-daily time horizons.Smart meter data is split between daily and hourly normalized time series to assess monthly,weekly,daily,and hourly seasonality patterns separately.The dimensionality reduction implicit in the splitting allows a direct approach to clustering raw daily energy time series data.The intraday clustering procedure sequentially identifies representative hourly day-unit profiles for each customer and the entire population.For the first time,a step function approach is applied to reduce time series dimensionality.Customer attributes embedded in surveys are employed to build external clustering validation metrics using Cramer’s V correlation factors and to identify statistically significant determinants of load-shape in energy usage.In addition,a time series features engineering approach is used to extract 16 relevant demand flexibility indicators that characterize customers and corresponding clusters along four different axes:available Energy(E),Temporal patterns(T),Consistency(C),and Variability(V).The methodology is implemented on a real-world electricity consumption dataset of 325 Small and Medium-sized Enterprise(SME)customers,identifying 4 daily and 6 hourly easy-to-interpret,well-defined clusters.The application of the methodology includes selecting key parameters via grid search and a thorough comparison of clustering distances and methods to ensure the robustness of the results.Further research can test the scalability of the methodology to larger datasets from various customer segments(households and large commercial)and locations with different weather and socioeconomic conditions.展开更多
To enhance the denoising performance of event-based sensors,we introduce a clustering-based temporal deep neural network denoising method(CBTDNN).Firstly,to cluster the sensor output data and obtain the respective clu...To enhance the denoising performance of event-based sensors,we introduce a clustering-based temporal deep neural network denoising method(CBTDNN).Firstly,to cluster the sensor output data and obtain the respective cluster centers,a combination of density-based spatial clustering of applications with noise(DBSCAN)and Kmeans++is utilized.Subsequently,long short-term memory(LSTM)is employed to fit and yield optimized cluster centers with temporal information.Lastly,based on the new cluster centers and denoising ratio,a radius threshold is set,and noise points beyond this threshold are removed.The comprehensive denoising metrics F1_score of CBTDNN have achieved 0.8931,0.7735,and 0.9215 on the traffic sequences dataset,pedestrian detection dataset,and turntable dataset,respectively.And these metrics demonstrate improvements of 49.90%,33.07%,19.31%,and 22.97%compared to four contrastive algorithms,namely nearest neighbor(NNb),nearest neighbor with polarity(NNp),Autoencoder,and multilayer perceptron denoising filter(MLPF).These results demonstrate that the proposed method enhances the denoising performance of event-based sensors.展开更多
Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin s...Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin samples,especially the high-order neighbor relationship between samples.To overcome the above challenges,this paper proposes a novel multi-order neighborhood fusion based multi-view deep subspace clustering model.We creatively integrate the multi-order proximity graph structures of different views into the self-expressive layer by a multi-order neighborhood fusion module.By this design,the multi-order Laplacian matrix supervises the learning of the view-consistent self-representation affinity matrix;then,we can obtain an optimal global affinity matrix where each connected node belongs to one cluster.In addition,the discriminative constraint between views is designed to further improve the clustering performance.A range of experiments on six public datasets demonstrates that the method performs better than other advanced multi-view clustering methods.The code is available at https://github.com/songzuolong/MNF-MDSC(accessed on 25 December 2024).展开更多
Numerous clustering algorithms are valuable in pattern recognition in forest vegetation,with new ones continually being proposed.While some are well-known,others are underutilized in vegetation science.This study comp...Numerous clustering algorithms are valuable in pattern recognition in forest vegetation,with new ones continually being proposed.While some are well-known,others are underutilized in vegetation science.This study compares the performance of practical iterative reallocation algorithms with model-based clustering algorithms.The data is from forest vegetation in Virginia(United States),the Hyrcanian Forest(Asia),and European beech forests.Practical iterative reallocation algorithms were applied as non-hierarchical methods and Finite Gaussian mixture modeling was used as a model-based clustering method.Due to limitations on dimensionality in model-based clustering,principal coordinates analysis was employed to reduce the dataset’s dimensions.A log transformation was applied to achieve a normal distribution for the pseudo-species data before calculating the Bray-Curtis dissimilarity.The findings indicate that the reallocation of misclassified objects based on silhouette width(OPTSIL)with Flexible-β(-0.25)had the highest mean among the tested clustering algorithms with Silhouette width 1(REMOS1)with Flexible-β(-0.25)second.However,model-based clustering performed poorly.Based on these results,it is recommended using OPTSIL with Flexible-β(-0.25)and REMOS1 with Flexible-β(-0.25)for forest vegetation classification instead of model-based clustering particularly for heterogeneous datasets common in forest vegetation community data.展开更多
The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has under...The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has undergone rapid and remarkable development.However,there is no relevant literature summarizing these achievements,and this paper attempts to elaborate on the current status and prospects in this field.Specifically,this review aims to discuss the development process of clustering methods for discontinuity sets and the state-of-the-art relevant algorithms.First,we introduce the importance of discontinuity clustering analysis and follow the comprehensive characterization approaches of discontinuity data.A bibliometric analysis is subsequently conducted to clarify the current status and development characteristics of the clustering of discontinuity sets.The methods for the clustering analysis of rock discontinuities are reviewed in terms of single-and multi-parameter clustering methods.Single-parameter methods can be classified into empirical judgment methods,dynamic clustering methods,relative static clustering methods,and static clustering methods,reflecting the continuous optimization and improvement of clustering algorithms.Moreover,this paper compares the current mainstream of single-parameter clustering methods with multi-parameter clustering methods.It is emphasized that the current single-parameter clustering methods have reached their performance limits,with little room for improvement,and that there is a need to extend the study of multi-parameter clustering methods.Finally,several suggestions are offered for future research on the clustering of discontinuity sets.展开更多
Symplectic symmetry approach to clustering(SSAC)in atomic nuclei,recently proposed,is modified and further developed in more detail.It is firstly applied to the light two-cluster^(20)Ne+αsystem of^(24)Mg,the latter e...Symplectic symmetry approach to clustering(SSAC)in atomic nuclei,recently proposed,is modified and further developed in more detail.It is firstly applied to the light two-cluster^(20)Ne+αsystem of^(24)Mg,the latter exhibiting well developed low-energy K^(π)=0_(1)^(+),k^(π)=2_(1)^(+) and π^(π)=0_(1)^(-) rotational bands in its spectrum.A simple algebraic Hamiltonian,consisting of dynamical symmetry,residual and vertical mixing parts is used to describe these three lowest rotational bands of positive and negative parity in^(24)Mg.A good description of the excitation energies is obtained by considering only the SU(3)cluster states restricted to the stretched many-particle Hilbert subspace,built on the leading Pauli allowed SU(3)multiplet for the positive-and negative-parity states,respectively.The coupling to the higher cluster-model configurations allows us to describe the known low-lying experimentally observed B(E2)transition probabilities within and between the cluster states of the three bands under consideration without the use of an effective charge.展开更多
Cluster-basedmodels have numerous application scenarios in vehicular ad-hoc networks(VANETs)and can greatly help improve the communication performance of VANETs.However,the frequent movement of vehicles can often lead...Cluster-basedmodels have numerous application scenarios in vehicular ad-hoc networks(VANETs)and can greatly help improve the communication performance of VANETs.However,the frequent movement of vehicles can often lead to changes in the network topology,thereby reducing cluster stability in urban scenarios.To address this issue,we propose a clustering model based on the density peak clustering(DPC)method and sparrow search algorithm(SSA),named SDPC.First,the model constructs a fitness function based on the parameters obtained from the DPC method and deploys the SSA for iterative optimization to select cluster heads(CHs).Then,the vehicles that have not been selected as CHs are assigned to appropriate clusters by comprehensively considering the distance parameter and link-reliability parameter.Finally,cluster maintenance strategies are considered to tackle the changes in the clusters’organizational structure.To verify the performance of the model,we conducted a simulation on a real-world scenario for multiple metrics related to clusters’stability.The results show that compared with the APROVE and the GAPC,SDPC showed clear performance advantages,indicating that SDPC can effectively ensure VANETs’cluster stability in urban scenarios.展开更多
Attribute-graph clustering aims to divide the graph nodes into distinct clusters in an unsupervised manner,which usually encodes the node attribute feature and the corresponding graph structure into a latent feature s...Attribute-graph clustering aims to divide the graph nodes into distinct clusters in an unsupervised manner,which usually encodes the node attribute feature and the corresponding graph structure into a latent feature space.However,traditional attribute-graph clustering methods often neglect the effect of neighbor information on clustering,leading to suboptimal clustering results as they fail to fully leverage the rich contextual information provided by neighboring nodes,which is crucial for capturing the intrinsic relationships between nodes and improving clustering performance.In this paper,we propose a novel Neighbor Dual-Consistency Constrained Attribute-Graph Clustering that leverages information from neighboring nodes in two significant aspects:neighbor feature consistency and neighbor distribution consistency.To enhance feature consistency among nodes and their neighbors,we introduce a neighbor contrastive loss that encourages the embeddings of nodes to be closer to those of their similar neighbors in the feature space while pushing them further apart from dissimilar neighbors.This method helps the model better capture local feature information.Furthermore,to ensure consistent cluster assignments between nodes and their neighbors,we introduce a neighbor distribution consistency module,which combines structural information from the graph with similarity of attributes to align cluster assignments between nodes and their neighbors.By integrating both local structural information and global attribute information,our approach effectively captures comprehensive patterns within the graph.Overall,our method demonstrates superior performance in capturing comprehensive patterns within the graph and achieves state-of-the-art clustering results on multiple datasets.展开更多
Underwater wireless sensor networks(UWSNs)have emerged as a new paradigm of real-time organized systems,which are utilized in a diverse array of scenarios to manage the underwater environment surrounding them.One of t...Underwater wireless sensor networks(UWSNs)have emerged as a new paradigm of real-time organized systems,which are utilized in a diverse array of scenarios to manage the underwater environment surrounding them.One of the major challenges that these systems confront is topology control via clustering,which reduces the overload of wireless communications within a network and ensures low energy consumption and good scalability.This study aimed to present a clustering technique in which the clustering process and cluster head(CH)selection are performed based on the Markov decision process and deep reinforcement learning(DRL).DRL algorithm selects the CH by maximizing the defined reward function.Subsequently,the sensed data are collected by the CHs and then sent to the autonomous underwater vehicles.In the final phase,the consumed energy by each sensor is calculated,and its residual energy is updated.Then,the autonomous underwater vehicle performs all clustering and CH selection operations.This procedure persists until the point of cessation when the sensor’s power has been reduced to such an extent that no node can become a CH.Through analysis of the findings from this investigation and their comparison with alternative frameworks,the implementation of this method can be used to control the cluster size and the number of CHs,which ultimately augments the energy usage of nodes and prolongs the lifespan of the network.Our simulation results illustrate that the suggested methodology surpasses the conventional low-energy adaptive clustering hierarchy,the distance-and energy-constrained K-means clustering scheme,and the vector-based forward protocol and is viable for deployment in an actual operational environment.展开更多
Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Alth...Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Although several active semi-supervised fuzzy clustering methods have been developed previously,they typically face significant limitations,including high computational complexity,sensitivity to initial cluster centroids,and difficulties in accurately managing boundary clusters where data points often overlap among multiple clusters.This study introduces a novel Active Semi-Supervised Fuzzy Clustering algorithm specifically designed to identify,analyze,and correct misclassified boundary elements.By strategically utilizing labeled data through active learning,our method improves the robustness and precision of cluster boundary assignments.Extensive experimental evaluations conducted on three types of datasets—including benchmark UCI datasets,synthetic data with controlled boundary overlap,and satellite imagery—demonstrate that our proposed approach achieves superior performance in terms of clustering accuracy and robustness compared to existing active semi-supervised fuzzy clustering methods.The results confirm the effectiveness and practicality of our method in handling real-world scenarios where precise cluster boundaries are critical.展开更多
基金supported by the National Science and Technology Major Project of China(No.2011ZX05029-003)CNPC Science Research and Technology Development Project,China(No.2013D-0904)
文摘In this study, we used the multi-resolution graph-based clustering (MRGC) method for determining the electrofacies (EF) and lithofacies (LF) from well log data obtained from the intraplatform bank gas fields located in the Amu Darya Basin. The MRGC could automatically determine the optimal number of clusters without prior knowledge about the structure or cluster numbers of the analyzed data set and allowed the users to control the level of detail actually needed to define the EF. Based on the LF identification and successful EF calibration using core data, an MRGC EF partition model including five clusters and a quantitative LF interpretation chart were constructed. The EF clusters 1 to 5 were interpreted as lagoon, anhydrite flat, interbank, low-energy bank, and high-energy bank, and the coincidence rate in the cored interval could reach 85%. We concluded that the MRGC could be accurately applied to predict the LF in non-cored but logged wells. Therefore, continuous EF clusters were partitioned and corresponding LF were characteristics &different LF were analyzed interpreted, and the distribution and petrophysical in the framework of sequence stratigraphy.
基金supported by the National Natural Science Foundation of China (12473105 and 12473106)the central government guides local funds for science and technology development (YDZJSX2024D049)the Graduate Student Practice and Innovation Program of Shanxi Province (2024SJ313)
文摘As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have become vital for identifying patterns and classifying celestial objects.This paper systematically investigates the application of five main categories of clustering techniques-partition-based,density-based,model-based,hierarchical,and“others”-across a range of astronomical research over the past decade.This review focuses on the six key application areas of stellar classification,galaxy structure analysis,detection of galactic and interstellar features,highenergy astrophysics,exoplanet studies,and anomaly detection.This paper provides an in-depth analysis of the performance and results of each method,considering their respective suitabilities for different data types.Additionally,it presents clustering algorithm selection strategies based on the characteristics of the spectroscopic data being analyzed.We highlight challenges such as handling large datasets,the need for more efficient computational tools,and the lack of labeled data.We also underscore the potential of unsupervised and semi-supervised clustering approaches to overcome these challenges,offering insight into their practical applications,performance,and results in astronomical research.
基金supported by the National Natural Science Foundation of China(Grant Nos.52069029,52369026)the Belt and Road Special Foundation of National Key Laboratory of Water Disaster Preven-tion(Grant No.2023490411)+2 种基金the Yunnan Agricultural Basic Research Joint Special General Project(Grant Nos.202501BD070001-060,202401BD070001-071)Construction Project of the Yunnan Key Laboratory of Water Security(No.20254916CE340051)the Youth Talent Project of“Xingdian Talent Support Plan”in Yunnan Province(Grant No.XDYC-QNRC-2023-0412).
文摘Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the construction and optimization of a prediction model for deformation prediction,a multipoint ultrahigh arch dam deformation prediction model,namely,the CEEMDAN-KPCA-GSWOA-KELM,which is based on a clustering partition,is pro-posed.First,the monitoring data are preprocessed via variational mode decomposition(VMD)and wavelet denoising(WT),which effectively filters out noise and improves the signal-to-noise ratio of the data,providing high-quality input data for subsequent prediction models.Second,scientific cluster partitioning is performed via the K-means++algorithm to precisely capture the spatial distribution characteristics of extra-high arch dams and ensure the consistency of deformation trends at measurement points within each partition.Finally,CEEMDAN is used to separate monitoring data,predict and analyze each component,combine the KPCA(Kernel Principal Component Analysis)and the KELM(Kernel Extreme Learning Machine)optimized by the GSWOA(Global Search Whale Optimization Algorithm),integrate the predictions of each component via reconstruction methods,and precisely predict the overall trend of ultrahigh arch dam deformation.An extra high arch dam project is taken as an example and validated via a comparative analysis of multiple models.The results show that the multipoint deformation prediction model in this paper can combine data from different measurement points,achieve a comprehensive,precise prediction of the deformation situation of extra high arch dams,and provide strong technical support for safe operation.
基金Supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),the Ministry of Health&Welfare,Republic of Korea(No.RS-2020-KH088726)the Patient-Centered Clinical Research Coordinating Center(PACEN),the Ministry of Health and Welfare,Republic of Korea(No.HC19C0276)the National Research Foundation of Korea(NRF),the Korea Government(MSIT)(No.RS-2023-00247504).
文摘AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 total deviation values(TDVs)from the first 10 VF tests of the training dataset,VF points were clustered into several regions using the hierarchical ordered partitioning and collapsing hybrid(HOPACH)and K-means clustering.Based on the clustering results,a linear regression analysis was applied to each clustered region of the testing dataset to predict the TDVs of the 10th VF test.Three to nine VF tests were used to predict the 10th VF test,and the prediction errors(root mean square error,RMSE)of each clustering method and pointwise linear regression(PLR)were compared.RESULTS:The training group consisted of 228 patients(mean age,54.20±14.38y;123 males and 105 females),and the testing group included 81 patients(mean age,54.88±15.22y;43 males and 38 females).All subjects were diagnosed with POAG.Fifty-two VF points were clustered into 11 and nine regions using HOPACH and K-means clustering,respectively.K-means clustering had a lower prediction error than PLR when n=1:3 and 1:4(both P≤0.003).The prediction errors of K-means clustering were lower than those of HOPACH in all sections(n=1:4 to 1:9;all P≤0.011),except for n=1:3(P=0.680).PLR outperformed K-means clustering only when n=1:8 and 1:9(both P≤0.020).CONCLUSION:K-means clustering can predict longterm VF test results more accurately in patients with POAG with limited VF data.
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.
基金supported by the Foundation of President of Hebei University(XZJJ202303).
文摘Federated learning is a machine learning framework designed to protect privacy by keeping training data on clients’devices without sharing private data.It trains a global model through collaboration between clients and the server.However,the presence of data heterogeneity can lead to inefficient model training and even reduce the final model’s accuracy and generalization capability.Meanwhile,data scarcity can result in suboptimal cluster distributions for few-shot clients in centralized clustering tasks,and standalone personalization tasks may cause severe overfitting issues.To address these limitations,we introduce a federated learning dual optimization model based on clustering and personalization strategy(FedCPS).FedCPS adopts a decentralized approach,where clients identify their cluster membership locally without relying on a centralized clustering algorithm.Building on this,FedCPS introduces personalized training tasks locally,adding a regularization term to control deviations between local and cluster models.This improves the generalization ability of the final model while mitigating overfitting.The use of weight-sharing techniques also reduces the computational cost of central machines.Experimental results on MNIST,FMNIST,CIFAR10,and CIFAR100 datasets demonstrate that our method achieves better personalization effects compared to other personalized federated learning methods,with an average test accuracy improvement of 0.81%–2.96%.Meanwhile,we adjusted the proportion of few-shot clients to evaluate the impact on accuracy across different methods.The experiments show that FedCPS reduces accuracy by only 0.2%–3.7%,compared to 2.1%–10%for existing methods.Our method demonstrates its advantages across diverse data environments.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
基金supported by National Major Scientific Research Instrument Development Project of China(No.51927804)Science Fund for Shaanxi Provincial Department of Education's Youth Innovation Team Research Plan under Grant(No.23JP169).
文摘In machine vision,elliptical targets frequently appear within the camera's region of interest(ROI).Ellipse detection is essential for shape detection and geometric measurements in machine vision.However,existing ellipse detection algorithms often face issues such as high computational complexity,strong dependence on initial conditions,sensitivity to noise,and lack of robustness to occlusions.In this paper,we propose a fast and robust ellipse detection method to address these challenges.This method first utilizes edge gradient and curvature information to segment the curve into circular arcs.Next,based on the convexity of the arcs,it divides them into different quadrants of the ellipse,groups and fits the arcs according to multiple geometric constraints at a low computational cost.Finally,it reduces the parameter space for hierarchical clustering and then segments the complete ellipse into several sectors for verification.We compare our method across seven datasets,including five public image datasets and two from industrial camera scenes.Experimental results show that our method achieves a precision ranging from 67.1%to 98.9%,a recall ranging from 48.1%to 92.9%,and an F-measure ranging from 58.0%to 95.8%.The average execution time per image ranges from 25 ms to 192 ms,demonstrating both high accuracy and efficiency.
基金supported by the National Natural Science Foundation of China(No.62271399)the National Key Research and Development Program of China(No.2022YFB1807102)。
文摘For multi-vehicle networks,Cooperative Positioning(CP)technique has become a promising way to enhance vehicle positioning accuracy.Especially,the CP performance could be further improved by introducing Sensor-Rich Vehicles(SRVs)into CP networks,which is called SRV-aided CP.However,the CP system may split into several sub-clusters that cannot be connected with each other in dense urban environments,in which the sub-clusters with few SRVs will suffer from degradation of CP performance.Since Unmanned Aerial Vehicles(UAVs)have been widely used to aid vehicular communications,we intend to utilize UAVs to assist sub-clusters in CP.In this paper,a UAV-aided CP network is constructed to fully utilize information from SRVs.First,the inter-node connection structure among the UAV and vehicles is designed to share available information from SRVs.After that,the clustering optimization strategy is proposed,in which the UAV cooperates with the high-precision sub-cluster to obtain available information from SRVs,and then broadcasts this positioning-related information to other low-precision sub-clusters.Finally,the Locally-Centralized Factor Graph Optimization(LC-FGO)algorithm is designed to fuse positioning information from cooperators.Simulation results indicate that the positioning accuracy of the CP system could be improved by fully utilizing positioning-related information from SRVs.
基金supported by the Spanish Ministry of Science and Innovation under Projects PID2022-137680OB-C32 and PID2022-139187OB-I00.
文摘Customer segmentation according to load-shape profiles using smart meter data is an increasingly important application to vital the planning and operation of energy systems and to enable citizens’participation in the energy transition.This study proposes an innovative multi-step clustering procedure to segment customers based on load-shape patterns at the daily and intra-daily time horizons.Smart meter data is split between daily and hourly normalized time series to assess monthly,weekly,daily,and hourly seasonality patterns separately.The dimensionality reduction implicit in the splitting allows a direct approach to clustering raw daily energy time series data.The intraday clustering procedure sequentially identifies representative hourly day-unit profiles for each customer and the entire population.For the first time,a step function approach is applied to reduce time series dimensionality.Customer attributes embedded in surveys are employed to build external clustering validation metrics using Cramer’s V correlation factors and to identify statistically significant determinants of load-shape in energy usage.In addition,a time series features engineering approach is used to extract 16 relevant demand flexibility indicators that characterize customers and corresponding clusters along four different axes:available Energy(E),Temporal patterns(T),Consistency(C),and Variability(V).The methodology is implemented on a real-world electricity consumption dataset of 325 Small and Medium-sized Enterprise(SME)customers,identifying 4 daily and 6 hourly easy-to-interpret,well-defined clusters.The application of the methodology includes selecting key parameters via grid search and a thorough comparison of clustering distances and methods to ensure the robustness of the results.Further research can test the scalability of the methodology to larger datasets from various customer segments(households and large commercial)and locations with different weather and socioeconomic conditions.
基金supported by the National Natural Science Foundation of China(No.62134004).
文摘To enhance the denoising performance of event-based sensors,we introduce a clustering-based temporal deep neural network denoising method(CBTDNN).Firstly,to cluster the sensor output data and obtain the respective cluster centers,a combination of density-based spatial clustering of applications with noise(DBSCAN)and Kmeans++is utilized.Subsequently,long short-term memory(LSTM)is employed to fit and yield optimized cluster centers with temporal information.Lastly,based on the new cluster centers and denoising ratio,a radius threshold is set,and noise points beyond this threshold are removed.The comprehensive denoising metrics F1_score of CBTDNN have achieved 0.8931,0.7735,and 0.9215 on the traffic sequences dataset,pedestrian detection dataset,and turntable dataset,respectively.And these metrics demonstrate improvements of 49.90%,33.07%,19.31%,and 22.97%compared to four contrastive algorithms,namely nearest neighbor(NNb),nearest neighbor with polarity(NNp),Autoencoder,and multilayer perceptron denoising filter(MLPF).These results demonstrate that the proposed method enhances the denoising performance of event-based sensors.
基金supported by the National Key R&D Program of China(2023YFC3304600).
文摘Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin samples,especially the high-order neighbor relationship between samples.To overcome the above challenges,this paper proposes a novel multi-order neighborhood fusion based multi-view deep subspace clustering model.We creatively integrate the multi-order proximity graph structures of different views into the self-expressive layer by a multi-order neighborhood fusion module.By this design,the multi-order Laplacian matrix supervises the learning of the view-consistent self-representation affinity matrix;then,we can obtain an optimal global affinity matrix where each connected node belongs to one cluster.In addition,the discriminative constraint between views is designed to further improve the clustering performance.A range of experiments on six public datasets demonstrates that the method performs better than other advanced multi-view clustering methods.The code is available at https://github.com/songzuolong/MNF-MDSC(accessed on 25 December 2024).
基金financially supported by the vice chancellor for research and technology of Urmia University
文摘Numerous clustering algorithms are valuable in pattern recognition in forest vegetation,with new ones continually being proposed.While some are well-known,others are underutilized in vegetation science.This study compares the performance of practical iterative reallocation algorithms with model-based clustering algorithms.The data is from forest vegetation in Virginia(United States),the Hyrcanian Forest(Asia),and European beech forests.Practical iterative reallocation algorithms were applied as non-hierarchical methods and Finite Gaussian mixture modeling was used as a model-based clustering method.Due to limitations on dimensionality in model-based clustering,principal coordinates analysis was employed to reduce the dataset’s dimensions.A log transformation was applied to achieve a normal distribution for the pseudo-species data before calculating the Bray-Curtis dissimilarity.The findings indicate that the reallocation of misclassified objects based on silhouette width(OPTSIL)with Flexible-β(-0.25)had the highest mean among the tested clustering algorithms with Silhouette width 1(REMOS1)with Flexible-β(-0.25)second.However,model-based clustering performed poorly.Based on these results,it is recommended using OPTSIL with Flexible-β(-0.25)and REMOS1 with Flexible-β(-0.25)for forest vegetation classification instead of model-based clustering particularly for heterogeneous datasets common in forest vegetation community data.
基金funding support from the National Natural Science Foundation of China(Grant No.42007269)the Young Talent Fund of Xi'an Association for Science and Technology(Grant No.959202313094)the Fundamental Research Funds for the Central Universities,CHD(Grant No.300102263401).
文摘The characterization and clustering of rock discontinuity sets are a crucial and challenging task in rock mechanics and geotechnical engineering.Over the past few decades,the clustering of discontinuity sets has undergone rapid and remarkable development.However,there is no relevant literature summarizing these achievements,and this paper attempts to elaborate on the current status and prospects in this field.Specifically,this review aims to discuss the development process of clustering methods for discontinuity sets and the state-of-the-art relevant algorithms.First,we introduce the importance of discontinuity clustering analysis and follow the comprehensive characterization approaches of discontinuity data.A bibliometric analysis is subsequently conducted to clarify the current status and development characteristics of the clustering of discontinuity sets.The methods for the clustering analysis of rock discontinuities are reviewed in terms of single-and multi-parameter clustering methods.Single-parameter methods can be classified into empirical judgment methods,dynamic clustering methods,relative static clustering methods,and static clustering methods,reflecting the continuous optimization and improvement of clustering algorithms.Moreover,this paper compares the current mainstream of single-parameter clustering methods with multi-parameter clustering methods.It is emphasized that the current single-parameter clustering methods have reached their performance limits,with little room for improvement,and that there is a need to extend the study of multi-parameter clustering methods.Finally,several suggestions are offered for future research on the clustering of discontinuity sets.
文摘Symplectic symmetry approach to clustering(SSAC)in atomic nuclei,recently proposed,is modified and further developed in more detail.It is firstly applied to the light two-cluster^(20)Ne+αsystem of^(24)Mg,the latter exhibiting well developed low-energy K^(π)=0_(1)^(+),k^(π)=2_(1)^(+) and π^(π)=0_(1)^(-) rotational bands in its spectrum.A simple algebraic Hamiltonian,consisting of dynamical symmetry,residual and vertical mixing parts is used to describe these three lowest rotational bands of positive and negative parity in^(24)Mg.A good description of the excitation energies is obtained by considering only the SU(3)cluster states restricted to the stretched many-particle Hilbert subspace,built on the leading Pauli allowed SU(3)multiplet for the positive-and negative-parity states,respectively.The coupling to the higher cluster-model configurations allows us to describe the known low-lying experimentally observed B(E2)transition probabilities within and between the cluster states of the three bands under consideration without the use of an effective charge.
文摘Cluster-basedmodels have numerous application scenarios in vehicular ad-hoc networks(VANETs)and can greatly help improve the communication performance of VANETs.However,the frequent movement of vehicles can often lead to changes in the network topology,thereby reducing cluster stability in urban scenarios.To address this issue,we propose a clustering model based on the density peak clustering(DPC)method and sparrow search algorithm(SSA),named SDPC.First,the model constructs a fitness function based on the parameters obtained from the DPC method and deploys the SSA for iterative optimization to select cluster heads(CHs).Then,the vehicles that have not been selected as CHs are assigned to appropriate clusters by comprehensively considering the distance parameter and link-reliability parameter.Finally,cluster maintenance strategies are considered to tackle the changes in the clusters’organizational structure.To verify the performance of the model,we conducted a simulation on a real-world scenario for multiple metrics related to clusters’stability.The results show that compared with the APROVE and the GAPC,SDPC showed clear performance advantages,indicating that SDPC can effectively ensure VANETs’cluster stability in urban scenarios.
基金supported by National Natural Science Foundation of China(Nos.62272015,62441232).
文摘Attribute-graph clustering aims to divide the graph nodes into distinct clusters in an unsupervised manner,which usually encodes the node attribute feature and the corresponding graph structure into a latent feature space.However,traditional attribute-graph clustering methods often neglect the effect of neighbor information on clustering,leading to suboptimal clustering results as they fail to fully leverage the rich contextual information provided by neighboring nodes,which is crucial for capturing the intrinsic relationships between nodes and improving clustering performance.In this paper,we propose a novel Neighbor Dual-Consistency Constrained Attribute-Graph Clustering that leverages information from neighboring nodes in two significant aspects:neighbor feature consistency and neighbor distribution consistency.To enhance feature consistency among nodes and their neighbors,we introduce a neighbor contrastive loss that encourages the embeddings of nodes to be closer to those of their similar neighbors in the feature space while pushing them further apart from dissimilar neighbors.This method helps the model better capture local feature information.Furthermore,to ensure consistent cluster assignments between nodes and their neighbors,we introduce a neighbor distribution consistency module,which combines structural information from the graph with similarity of attributes to align cluster assignments between nodes and their neighbors.By integrating both local structural information and global attribute information,our approach effectively captures comprehensive patterns within the graph.Overall,our method demonstrates superior performance in capturing comprehensive patterns within the graph and achieves state-of-the-art clustering results on multiple datasets.
文摘Underwater wireless sensor networks(UWSNs)have emerged as a new paradigm of real-time organized systems,which are utilized in a diverse array of scenarios to manage the underwater environment surrounding them.One of the major challenges that these systems confront is topology control via clustering,which reduces the overload of wireless communications within a network and ensures low energy consumption and good scalability.This study aimed to present a clustering technique in which the clustering process and cluster head(CH)selection are performed based on the Markov decision process and deep reinforcement learning(DRL).DRL algorithm selects the CH by maximizing the defined reward function.Subsequently,the sensed data are collected by the CHs and then sent to the autonomous underwater vehicles.In the final phase,the consumed energy by each sensor is calculated,and its residual energy is updated.Then,the autonomous underwater vehicle performs all clustering and CH selection operations.This procedure persists until the point of cessation when the sensor’s power has been reduced to such an extent that no node can become a CH.Through analysis of the findings from this investigation and their comparison with alternative frameworks,the implementation of this method can be used to control the cluster size and the number of CHs,which ultimately augments the energy usage of nodes and prolongs the lifespan of the network.Our simulation results illustrate that the suggested methodology surpasses the conventional low-energy adaptive clustering hierarchy,the distance-and energy-constrained K-means clustering scheme,and the vector-based forward protocol and is viable for deployment in an actual operational environment.
文摘Active semi-supervised fuzzy clustering integrates fuzzy clustering techniques with limited labeled data,guided by active learning,to enhance classification accuracy,particularly in complex and ambiguous datasets.Although several active semi-supervised fuzzy clustering methods have been developed previously,they typically face significant limitations,including high computational complexity,sensitivity to initial cluster centroids,and difficulties in accurately managing boundary clusters where data points often overlap among multiple clusters.This study introduces a novel Active Semi-Supervised Fuzzy Clustering algorithm specifically designed to identify,analyze,and correct misclassified boundary elements.By strategically utilizing labeled data through active learning,our method improves the robustness and precision of cluster boundary assignments.Extensive experimental evaluations conducted on three types of datasets—including benchmark UCI datasets,synthetic data with controlled boundary overlap,and satellite imagery—demonstrate that our proposed approach achieves superior performance in terms of clustering accuracy and robustness compared to existing active semi-supervised fuzzy clustering methods.The results confirm the effectiveness and practicality of our method in handling real-world scenarios where precise cluster boundaries are critical.