Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rel...Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rely on large amounts of labeled data,which are costly and time-consuming to obtain,especially in largescale or dynamic environments.To address this challenge,we propose the Semi-Supervised Multi-View Picture Fuzzy Clustering(SS-MPFC)algorithm,which improves segmentation accuracy and robustness,particularly in complex and uncertain remote sensing scenarios.SS-MPFC unifies three paradigms:semi-supervised learning,multi-view clustering,and picture fuzzy set theory.This integration allows the model to effectively utilize a small number of labeled samples,fuse complementary information from multiple data views,and handle the ambiguity and uncertainty inherent in satellite imagery.We design a novel objective function that jointly incorporates picture fuzzy membership functions across multiple views of the data,and embeds pairwise semi-supervised constraints(must-link and cannot-link)directly into the clustering process to enhance segmentation accuracy.Experiments conducted on several benchmark satellite datasets demonstrate that SS-MPFC significantly outperforms existing state-of-the-art methods in segmentation accuracy,noise robustness,and semantic interpretability.On the Augsburg dataset,SS-MPFC achieves a Purity of 0.8158 and an Accuracy of 0.6860,highlighting its outstanding robustness and efficiency.These results demonstrate that SSMPFC offers a scalable and effective solution for real-world satellite-based monitoring systems,particularly in scenarios where rapid annotation is infeasible,such as wildfire tracking,agricultural monitoring,and dynamic urban mapping.展开更多
Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy cl...Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.展开更多
Binary multi-view clustering has attracted intense attention from researchers due to its efficiency in handling large-scale datasets.However,previous clustering approaches suffer from at least two limitations.First,th...Binary multi-view clustering has attracted intense attention from researchers due to its efficiency in handling large-scale datasets.However,previous clustering approaches suffer from at least two limitations.First,they ignore correlations among the features of original data.As a result,the geometric consistency of data is not preserved in the to-be-learnt binary representation space.Second,redundant and noisy features mixed in original data inevitably limit the ultimate clustering performance.In light of this,we propose a novel discriminative binary multi-view clustering(DBMVC)method to address the issues.Specifically,the proposed DBMVC first maps original data onto the Hamming space to obtain corresponding binary codes,which can effectively reduce the computational complexity and storage costs in the following steps.To enable our method to select useful features from original data and get a discriminative representation,the-norm is used to constrain the feature projection matrix.In addition,a graph regularization term is further introduced to preserve the local manifold structure of the learned binary representation.Finally,an alternative iterative optimization algorithm is designed to solve the optimization problems of the objective function.Comprehensive experiments on six large-scale multi-view datasets validate that the proposed DBMVC markedly outperforms other state-of-the-art methods in terms of effectiveness and efficiency.展开更多
As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have...As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have become vital for identifying patterns and classifying celestial objects.This paper systematically investigates the application of five main categories of clustering techniques-partition-based,density-based,model-based,hierarchical,and“others”-across a range of astronomical research over the past decade.This review focuses on the six key application areas of stellar classification,galaxy structure analysis,detection of galactic and interstellar features,highenergy astrophysics,exoplanet studies,and anomaly detection.This paper provides an in-depth analysis of the performance and results of each method,considering their respective suitabilities for different data types.Additionally,it presents clustering algorithm selection strategies based on the characteristics of the spectroscopic data being analyzed.We highlight challenges such as handling large datasets,the need for more efficient computational tools,and the lack of labeled data.We also underscore the potential of unsupervised and semi-supervised clustering approaches to overcome these challenges,offering insight into their practical applications,performance,and results in astronomical research.展开更多
Multichannel signals have the characteristics of information diversity and information consistency.To better explore and utilize the affinity relationship within multichannel signals,a new graph learning technique bas...Multichannel signals have the characteristics of information diversity and information consistency.To better explore and utilize the affinity relationship within multichannel signals,a new graph learning technique based on low rank tensor approximation is proposed for multichannel monitoring signal processing and utilization.Firstly,the affinity relationship of multichannel signals can be acquired based on the clustering results of each channel signal.Wherein an affinity tensor is constructed to integrate the diverse and consistent information of the clustering information among multichannel signals.Secondly,a low-rank tensor optimization model is built and the joint affinity matrix is optimized with the assistance of the strong confidence affinity matrix.Through solving the optimization model,the fused affinity relationship graph of multichannel signals can be obtained.Finally,the multichannel fused clustering results can be acquired though the updated joint affinity relationship graph.The multichannel signal utilization examples in health state assessment with public datasets and microwave detection with actual echoes verify the advantages and effectiveness of the proposed method.展开更多
Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin s...Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin samples,especially the high-order neighbor relationship between samples.To overcome the above challenges,this paper proposes a novel multi-order neighborhood fusion based multi-view deep subspace clustering model.We creatively integrate the multi-order proximity graph structures of different views into the self-expressive layer by a multi-order neighborhood fusion module.By this design,the multi-order Laplacian matrix supervises the learning of the view-consistent self-representation affinity matrix;then,we can obtain an optimal global affinity matrix where each connected node belongs to one cluster.In addition,the discriminative constraint between views is designed to further improve the clustering performance.A range of experiments on six public datasets demonstrates that the method performs better than other advanced multi-view clustering methods.The code is available at https://github.com/songzuolong/MNF-MDSC(accessed on 25 December 2024).展开更多
This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the compl...This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the complexities,simulation time cost and convergence problems of detailed PV power station models.First,the amplitude–frequency curves of different filter parameters are analyzed.Based on the results,a grouping parameter set for characterizing the external filter characteristics is established.These parameters are further defined as clustering parameters.A single PV inverter model is then established as a prerequisite foundation.The proposed equivalent method combines the global search capability of PSO with the rapid convergence of KMC,effectively overcoming the tendency of KMC to become trapped in local optima.This approach enhances both clustering accuracy and numerical stability when determining equivalence for PV inverter units.Using the proposed clustering method,both a detailed PV power station model and an equivalent model are developed and compared.Simulation and hardwarein-loop(HIL)results based on the equivalent model verify that the equivalent method accurately represents the dynamic characteristics of PVpower stations and adapts well to different operating conditions.The proposed equivalent modeling method provides an effective analysis tool for future renewable energy integration research.展开更多
The Intrusion Detection System(IDS)is a security mechanism developed to observe network traffic and recognize suspicious or malicious activities.Clustering algorithms are often incorporated into IDS;however,convention...The Intrusion Detection System(IDS)is a security mechanism developed to observe network traffic and recognize suspicious or malicious activities.Clustering algorithms are often incorporated into IDS;however,conventional clustering-based methods face notable drawbacks,including poor scalability in handling high-dimensional datasets and a strong dependence of outcomes on initial conditions.To overcome the performance limitations of existing methods,this study proposes a novel quantum-inspired clustering algorithm that relies on a similarity coefficient-based quantum genetic algorithm(SC-QGA)and an improved quantum artificial bee colony algorithm hybrid K-means(IQABC-K).First,the SC-QGA algorithmis constructed based on quantum computing and integrates similarity coefficient theory to strengthen genetic diversity and feature extraction capabilities.For the subsequent clustering phase,the process based on the IQABC-K algorithm is enhanced with the core improvement of adaptive rotation gate and movement exploitation strategies to balance the exploration capabilities of global search and the exploitation capabilities of local search.Simultaneously,the acceleration of convergence toward the global optimum and a reduction in computational complexity are facilitated by means of the global optimum bootstrap strategy and a linear population reduction strategy.Through experimental evaluation with multiple algorithms and diverse performance metrics,the proposed algorithm confirms reliable accuracy on three datasets:KDD CUP99,NSL_KDD,and UNSW_NB15,achieving accuracy of 98.57%,98.81%,and 98.32%,respectively.These results affirm its potential as an effective solution for practical clustering applications.展开更多
Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the constru...Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the construction and optimization of a prediction model for deformation prediction,a multipoint ultrahigh arch dam deformation prediction model,namely,the CEEMDAN-KPCA-GSWOA-KELM,which is based on a clustering partition,is pro-posed.First,the monitoring data are preprocessed via variational mode decomposition(VMD)and wavelet denoising(WT),which effectively filters out noise and improves the signal-to-noise ratio of the data,providing high-quality input data for subsequent prediction models.Second,scientific cluster partitioning is performed via the K-means++algorithm to precisely capture the spatial distribution characteristics of extra-high arch dams and ensure the consistency of deformation trends at measurement points within each partition.Finally,CEEMDAN is used to separate monitoring data,predict and analyze each component,combine the KPCA(Kernel Principal Component Analysis)and the KELM(Kernel Extreme Learning Machine)optimized by the GSWOA(Global Search Whale Optimization Algorithm),integrate the predictions of each component via reconstruction methods,and precisely predict the overall trend of ultrahigh arch dam deformation.An extra high arch dam project is taken as an example and validated via a comparative analysis of multiple models.The results show that the multipoint deformation prediction model in this paper can combine data from different measurement points,achieve a comprehensive,precise prediction of the deformation situation of extra high arch dams,and provide strong technical support for safe operation.展开更多
AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 to...AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 total deviation values(TDVs)from the first 10 VF tests of the training dataset,VF points were clustered into several regions using the hierarchical ordered partitioning and collapsing hybrid(HOPACH)and K-means clustering.Based on the clustering results,a linear regression analysis was applied to each clustered region of the testing dataset to predict the TDVs of the 10th VF test.Three to nine VF tests were used to predict the 10th VF test,and the prediction errors(root mean square error,RMSE)of each clustering method and pointwise linear regression(PLR)were compared.RESULTS:The training group consisted of 228 patients(mean age,54.20±14.38y;123 males and 105 females),and the testing group included 81 patients(mean age,54.88±15.22y;43 males and 38 females).All subjects were diagnosed with POAG.Fifty-two VF points were clustered into 11 and nine regions using HOPACH and K-means clustering,respectively.K-means clustering had a lower prediction error than PLR when n=1:3 and 1:4(both P≤0.003).The prediction errors of K-means clustering were lower than those of HOPACH in all sections(n=1:4 to 1:9;all P≤0.011),except for n=1:3(P=0.680).PLR outperformed K-means clustering only when n=1:8 and 1:9(both P≤0.020).CONCLUSION:K-means clustering can predict longterm VF test results more accurately in patients with POAG with limited VF data.展开更多
The increasing prevalence of multi-view data has made multi-view clustering a crucial technique for discovering latent structures from heterogeneous representations.However,traditional fuzzy clustering algorithms show...The increasing prevalence of multi-view data has made multi-view clustering a crucial technique for discovering latent structures from heterogeneous representations.However,traditional fuzzy clustering algorithms show limitations with the inherent uncertainty and imprecision of such data,as they rely on a single-dimensional membership value.To overcome these limitations,we propose an auto-weighted multi-view neutrosophic fuzzy clustering(AW-MVNFC)algorithm.Our method leverages the neutrosophic framework,an extension of fuzzy sets,to explicitly model imprecision and ambiguity through three membership degrees.The core novelty of AWMVNFC lies in a hierarchical weighting strategy that adaptively learns the contributions of both individual data views and the importance of each feature within a view.Through a unified objective function,AW-MVNFC jointly optimizes the neutrosophic membership assignments,cluster centers,and the distributions of view and feature weights.Comprehensive experiments conducted on synthetic and real-world datasets demonstrate that our algorithm achieves more accurate and stable clustering than existing methods,demonstrating its effectiveness in handling the complexities of multi-view data.展开更多
Various factors,including weak tie-lines into the electric power system(EPS)networks,can lead to low-frequency oscillations(LFOs),which are considered an instant,non-threatening situation,but slow-acting and poisonous...Various factors,including weak tie-lines into the electric power system(EPS)networks,can lead to low-frequency oscillations(LFOs),which are considered an instant,non-threatening situation,but slow-acting and poisonous.Considering the challenge mentioned,this article proposes a clustering-based machine learning(ML)framework to enhance the stability of EPS networks by suppressing LFOs through real-time tuning of key power system stabilizer(PSS)parameters.To validate the proposed strategy,two distinct EPS networks are selected:the single-machine infinite-bus(SMIB)with a single-stage PSS and the unified power flow controller(UPFC)coordinated SMIB with a double-stage PSS.To generate data under various loading conditions for both networks,an efficient but offline meta-heuristic algorithm,namely the grey wolf optimizer(GWO),is used,with the loading conditions as inputs and the key PSS parameters as outputs.The generated loading conditions are then clustered using the fuzzy k-means(FKM)clustering method.Finally,the group method of data handling(GMDH)and long short-term memory(LSTM)ML models are developed for clustered data to predict PSS key parameters in real time for any loading condition.A few well-known statistical performance indices(SPI)are considered for validation and robustness of the training and testing procedure of the developed FKM-GMDH and FKM-LSTM models based on the prediction of PSS parameters.The performance of the ML models is also evaluated using three stability indices(i.e.,minimum damping ratio,eigenvalues,and time-domain simulations)after optimally tuned PSS with real-time estimated parameters under changing operating conditions.Besides,the outputs of the offline(GWO-based)metaheuristic model,proposed real-time(FKM-GMDH and FKM-LSTM)machine learning models,and previously reported literature models are compared.According to the results,the proposed methodology outperforms the others in enhancing the stability of the selected EPS networks by damping out the observed unwanted LFOs under various loading conditions.展开更多
Real-world data can often be represented in multiple forms and views,and analyzing data from different perspectives allows for more comprehensive learning of the data,resulting in better data clustering results.Non-ne...Real-world data can often be represented in multiple forms and views,and analyzing data from different perspectives allows for more comprehensive learning of the data,resulting in better data clustering results.Non-negative matrix factorization(NMF)is used to solve the clustering problem to extract uniform discriminative low-dimensional features from multi-view data.Many clustering methods based on graph regularization have been proposed and proven to be effective,but ordinary graphs only consider pairwise relationships between samples.In order to learn the higher-order relationships that exist in the sample manifold and feature manifold of multi-view data,we propose a new semi-supervised multi-view clustering method called dual hypergraph regularized partially shared non-negative matrix factorization(DHPS-NMF).The complex manifold structure of samples and features is learned by constructing samples and feature hypergraphs.To improve the discrimination power of the obtained lowdimensional features,semi-supervised regression terms are incorporated into the model to effectively use the label information when capturing the complex manifold structure of the data.Ultimately,we conduct experiments on six real data sets and the results show that our algorithm achieves encouraging results in comparison with some methods.展开更多
Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, t...Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.展开更多
In order to realize the intelligent mechanization of the last process of the fruit industry chains,the identification of fruit packing boxes is researched.A multi-view database is established to describe the omnidirec...In order to realize the intelligent mechanization of the last process of the fruit industry chains,the identification of fruit packing boxes is researched.A multi-view database is established to describe the omnidirectional attitudes of the fruit packing boxes.In order to reduce the data redundancy caused by multi-view acquisition,a new binary multi-view kernel principal component analysis network(BMKPCANet) is built,and a multi-view recognition method of fruit packing boxes is proposed based on the BMKPCANet and support vector machine(SVM).The experimental results show that the recognition accuracy of proposed BMKPCANet is 12.82% higher than PCANet and3.51% higher than KPCANet on average.The time consumption of proposed BMKPCANet is 7.74%lower than PCANet and 29.01% lower than KPCANet on average.This work has laid a theoretical foundation for multi-view recognition of 3 D objects and has a good practical application value.展开更多
As a class of effective methods for incomplete multi-view clustering,graph-based algorithms have recently drawn wide attention.However,most of them could use further improvement regarding the following aspects.First,i...As a class of effective methods for incomplete multi-view clustering,graph-based algorithms have recently drawn wide attention.However,most of them could use further improvement regarding the following aspects.First,in some graph-based models,all views are forced to share a common similarity graph regardless of the severe consistency degeneration due to incomplete views.Next,similarity graph construction and cluster analysis are sometimes performed separately.Finally,the contribution difference of individual views is not always carefully considered.To address these issues simultaneously,this paper proposes an incomplete multi-view clustering algorithm based on auto-weighted fusion in partition space.In our algorithm,the information of cluster structure is introduced into the process of similarity learning to construct a desirable similarity graph,information fusion is performed in partition space to alleviate the negative impact brought about by consistency degradation,and all views are adaptively weighted to reflect their different contributions to clustering tasks.Finally,all the subtasks are collaboratively optimized in a united framework to reach an overall optimal result.Experimental results show that the proposed method compares favorably with the state-of-the-art methods.展开更多
The existing multi-view subspace clustering algorithms based on tensor singular value decomposition(t-SVD)predominantly utilize tensor nuclear norm to explore the intra view correlation between views of the same sampl...The existing multi-view subspace clustering algorithms based on tensor singular value decomposition(t-SVD)predominantly utilize tensor nuclear norm to explore the intra view correlation between views of the same samples,while neglecting the correlation among the samples within different views.Moreover,the tensor nuclear norm is not fully considered as a convex approximation of the tensor rank function.Treating different singular values equally may result in suboptimal tensor representation.A hypergraph regularized multi-view subspace clustering algorithm with dual tensor log-determinant(HRMSC-DTL)was proposed.The algorithm used subspace learning in each view to learn a specific set of affinity matrices,and introduced a non-convex tensor log-determinant function to replace the tensor nuclear norm to better improve global low-rankness.It also introduced hyper-Laplacian regularization to preserve the local geometric structure embedded in the high-dimensional space.Furthermore,it rotated the original tensor and incorporated a dual tensor mechanism to fully exploit the intra view correlation of the original tensor and the inter view correlation of the rotated tensor.At the same time,an alternating direction of multipliers method(ADMM)was also designed to solve non-convex optimization model.Experimental evaluations on seven widely used datasets,along with comparisons to several state-of-the-art algorithms,demonstrated the superiority and effectiveness of the HRMSC-DTL algorithm in terms of clustering performance.展开更多
It is challenging to cluster multi-view data in which the clusters have overlapping areas.Existing multi-view clustering methods often misclassify the indistinguishable objects in overlapping areas by forcing them int...It is challenging to cluster multi-view data in which the clusters have overlapping areas.Existing multi-view clustering methods often misclassify the indistinguishable objects in overlapping areas by forcing them into single clusters,increasing clustering errors.Our solution,the multi-view dynamic kernelized evidential clustering method(MvDKE),addresses this by assigning these objects to meta-clusters,a union of several related singleton clusters,effectively capturing the local imprecision in overlapping areas.MvDKE offers two main advantages:firstly,it significantly reduces computational complexity through a dynamic framework for evidential clustering,and secondly,it adeptly handles non-spherical data using kernel techniques within its objective function.Experiments on various datasets confirm MvDKE's superior ability to accurately characterize the local imprecision in multi-view non-spherical data,achieving better efficiency and outperforming existing methods in overall performance.展开更多
In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very importan...In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very important in big data mining and analysis. This calls for advanced techniques that consider the diversity of different views,while fusing these data. Multi-view Clustering(MvC) has attracted increasing attention in recent years by aiming to exploit complementary and consensus information across multiple views. This paper summarizes a large number of multi-view clustering algorithms, provides a taxonomy according to the mechanisms and principles involved, and classifies these algorithms into five categories, namely, co-training style algorithms, multi-kernel learning, multiview graph clustering, multi-view subspace clustering, and multi-task multi-view clustering. Therein, multi-view graph clustering is further categorized as graph-based, network-based, and spectral-based methods. Multi-view subspace clustering is further divided into subspace learning-based, and non-negative matrix factorization-based methods. This paper does not only introduce the mechanisms for each category of methods, but also gives a few examples for how these techniques are used. In addition, it lists some publically available multi-view datasets.Overall, this paper serves as an introductory text and survey for multi-view clustering.展开更多
Federated learning is a machine learning framework designed to protect privacy by keeping training data on clients’devices without sharing private data.It trains a global model through collaboration between clients a...Federated learning is a machine learning framework designed to protect privacy by keeping training data on clients’devices without sharing private data.It trains a global model through collaboration between clients and the server.However,the presence of data heterogeneity can lead to inefficient model training and even reduce the final model’s accuracy and generalization capability.Meanwhile,data scarcity can result in suboptimal cluster distributions for few-shot clients in centralized clustering tasks,and standalone personalization tasks may cause severe overfitting issues.To address these limitations,we introduce a federated learning dual optimization model based on clustering and personalization strategy(FedCPS).FedCPS adopts a decentralized approach,where clients identify their cluster membership locally without relying on a centralized clustering algorithm.Building on this,FedCPS introduces personalized training tasks locally,adding a regularization term to control deviations between local and cluster models.This improves the generalization ability of the final model while mitigating overfitting.The use of weight-sharing techniques also reduces the computational cost of central machines.Experimental results on MNIST,FMNIST,CIFAR10,and CIFAR100 datasets demonstrate that our method achieves better personalization effects compared to other personalized federated learning methods,with an average test accuracy improvement of 0.81%–2.96%.Meanwhile,we adjusted the proportion of few-shot clients to evaluate the impact on accuracy across different methods.The experiments show that FedCPS reduces accuracy by only 0.2%–3.7%,compared to 2.1%–10%for existing methods.Our method demonstrates its advantages across diverse data environments.展开更多
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rely on large amounts of labeled data,which are costly and time-consuming to obtain,especially in largescale or dynamic environments.To address this challenge,we propose the Semi-Supervised Multi-View Picture Fuzzy Clustering(SS-MPFC)algorithm,which improves segmentation accuracy and robustness,particularly in complex and uncertain remote sensing scenarios.SS-MPFC unifies three paradigms:semi-supervised learning,multi-view clustering,and picture fuzzy set theory.This integration allows the model to effectively utilize a small number of labeled samples,fuse complementary information from multiple data views,and handle the ambiguity and uncertainty inherent in satellite imagery.We design a novel objective function that jointly incorporates picture fuzzy membership functions across multiple views of the data,and embeds pairwise semi-supervised constraints(must-link and cannot-link)directly into the clustering process to enhance segmentation accuracy.Experiments conducted on several benchmark satellite datasets demonstrate that SS-MPFC significantly outperforms existing state-of-the-art methods in segmentation accuracy,noise robustness,and semantic interpretability.On the Augsburg dataset,SS-MPFC achieves a Purity of 0.8158 and an Accuracy of 0.6860,highlighting its outstanding robustness and efficiency.These results demonstrate that SSMPFC offers a scalable and effective solution for real-world satellite-based monitoring systems,particularly in scenarios where rapid annotation is infeasible,such as wildfire tracking,agricultural monitoring,and dynamic urban mapping.
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.
基金supported by the National Natural Science Foundation of China under Grant Nos.62476258,62076228,and 62325604.
文摘Binary multi-view clustering has attracted intense attention from researchers due to its efficiency in handling large-scale datasets.However,previous clustering approaches suffer from at least two limitations.First,they ignore correlations among the features of original data.As a result,the geometric consistency of data is not preserved in the to-be-learnt binary representation space.Second,redundant and noisy features mixed in original data inevitably limit the ultimate clustering performance.In light of this,we propose a novel discriminative binary multi-view clustering(DBMVC)method to address the issues.Specifically,the proposed DBMVC first maps original data onto the Hamming space to obtain corresponding binary codes,which can effectively reduce the computational complexity and storage costs in the following steps.To enable our method to select useful features from original data and get a discriminative representation,the-norm is used to constrain the feature projection matrix.In addition,a graph regularization term is further introduced to preserve the local manifold structure of the learned binary representation.Finally,an alternative iterative optimization algorithm is designed to solve the optimization problems of the objective function.Comprehensive experiments on six large-scale multi-view datasets validate that the proposed DBMVC markedly outperforms other state-of-the-art methods in terms of effectiveness and efficiency.
基金supported by the National Natural Science Foundation of China (12473105 and 12473106)the central government guides local funds for science and technology development (YDZJSX2024D049)the Graduate Student Practice and Innovation Program of Shanxi Province (2024SJ313)
文摘As large-scale astronomical surveys,such as the Sloan Digital Sky Survey(SDSS)and the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST),generate increasingly complex datasets,clustering algorithms have become vital for identifying patterns and classifying celestial objects.This paper systematically investigates the application of five main categories of clustering techniques-partition-based,density-based,model-based,hierarchical,and“others”-across a range of astronomical research over the past decade.This review focuses on the six key application areas of stellar classification,galaxy structure analysis,detection of galactic and interstellar features,highenergy astrophysics,exoplanet studies,and anomaly detection.This paper provides an in-depth analysis of the performance and results of each method,considering their respective suitabilities for different data types.Additionally,it presents clustering algorithm selection strategies based on the characteristics of the spectroscopic data being analyzed.We highlight challenges such as handling large datasets,the need for more efficient computational tools,and the lack of labeled data.We also underscore the potential of unsupervised and semi-supervised clustering approaches to overcome these challenges,offering insight into their practical applications,performance,and results in astronomical research.
基金supported by Shanghai Aerospace Science and Technology Innovation Foundation(SAST2023-075)。
文摘Multichannel signals have the characteristics of information diversity and information consistency.To better explore and utilize the affinity relationship within multichannel signals,a new graph learning technique based on low rank tensor approximation is proposed for multichannel monitoring signal processing and utilization.Firstly,the affinity relationship of multichannel signals can be acquired based on the clustering results of each channel signal.Wherein an affinity tensor is constructed to integrate the diverse and consistent information of the clustering information among multichannel signals.Secondly,a low-rank tensor optimization model is built and the joint affinity matrix is optimized with the assistance of the strong confidence affinity matrix.Through solving the optimization model,the fused affinity relationship graph of multichannel signals can be obtained.Finally,the multichannel fused clustering results can be acquired though the updated joint affinity relationship graph.The multichannel signal utilization examples in health state assessment with public datasets and microwave detection with actual echoes verify the advantages and effectiveness of the proposed method.
基金supported by the National Key R&D Program of China(2023YFC3304600).
文摘Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin samples,especially the high-order neighbor relationship between samples.To overcome the above challenges,this paper proposes a novel multi-order neighborhood fusion based multi-view deep subspace clustering model.We creatively integrate the multi-order proximity graph structures of different views into the self-expressive layer by a multi-order neighborhood fusion module.By this design,the multi-order Laplacian matrix supervises the learning of the view-consistent self-representation affinity matrix;then,we can obtain an optimal global affinity matrix where each connected node belongs to one cluster.In addition,the discriminative constraint between views is designed to further improve the clustering performance.A range of experiments on six public datasets demonstrates that the method performs better than other advanced multi-view clustering methods.The code is available at https://github.com/songzuolong/MNF-MDSC(accessed on 25 December 2024).
基金supported by the Research Project of China Southern Power Grid(No.056200KK52222031).
文摘This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the complexities,simulation time cost and convergence problems of detailed PV power station models.First,the amplitude–frequency curves of different filter parameters are analyzed.Based on the results,a grouping parameter set for characterizing the external filter characteristics is established.These parameters are further defined as clustering parameters.A single PV inverter model is then established as a prerequisite foundation.The proposed equivalent method combines the global search capability of PSO with the rapid convergence of KMC,effectively overcoming the tendency of KMC to become trapped in local optima.This approach enhances both clustering accuracy and numerical stability when determining equivalence for PV inverter units.Using the proposed clustering method,both a detailed PV power station model and an equivalent model are developed and compared.Simulation and hardwarein-loop(HIL)results based on the equivalent model verify that the equivalent method accurately represents the dynamic characteristics of PVpower stations and adapts well to different operating conditions.The proposed equivalent modeling method provides an effective analysis tool for future renewable energy integration research.
基金supported by the NSFC(Grant Nos.62176273,62271070,62441212)The Open Foundation of State Key Laboratory of Networking and Switching Technology(Beijing University of Posts and Telecommunications)under Grant SKLNST-2024-1-062025Major Project of the Natural Science Foundation of Inner Mongolia(2025ZD008).
文摘The Intrusion Detection System(IDS)is a security mechanism developed to observe network traffic and recognize suspicious or malicious activities.Clustering algorithms are often incorporated into IDS;however,conventional clustering-based methods face notable drawbacks,including poor scalability in handling high-dimensional datasets and a strong dependence of outcomes on initial conditions.To overcome the performance limitations of existing methods,this study proposes a novel quantum-inspired clustering algorithm that relies on a similarity coefficient-based quantum genetic algorithm(SC-QGA)and an improved quantum artificial bee colony algorithm hybrid K-means(IQABC-K).First,the SC-QGA algorithmis constructed based on quantum computing and integrates similarity coefficient theory to strengthen genetic diversity and feature extraction capabilities.For the subsequent clustering phase,the process based on the IQABC-K algorithm is enhanced with the core improvement of adaptive rotation gate and movement exploitation strategies to balance the exploration capabilities of global search and the exploitation capabilities of local search.Simultaneously,the acceleration of convergence toward the global optimum and a reduction in computational complexity are facilitated by means of the global optimum bootstrap strategy and a linear population reduction strategy.Through experimental evaluation with multiple algorithms and diverse performance metrics,the proposed algorithm confirms reliable accuracy on three datasets:KDD CUP99,NSL_KDD,and UNSW_NB15,achieving accuracy of 98.57%,98.81%,and 98.32%,respectively.These results affirm its potential as an effective solution for practical clustering applications.
基金supported by the National Natural Science Foundation of China(Grant Nos.52069029,52369026)the Belt and Road Special Foundation of National Key Laboratory of Water Disaster Preven-tion(Grant No.2023490411)+2 种基金the Yunnan Agricultural Basic Research Joint Special General Project(Grant Nos.202501BD070001-060,202401BD070001-071)Construction Project of the Yunnan Key Laboratory of Water Security(No.20254916CE340051)the Youth Talent Project of“Xingdian Talent Support Plan”in Yunnan Province(Grant No.XDYC-QNRC-2023-0412).
文摘Deformation prediction for extra-high arch dams is highly important for ensuring their safe operation.To address the challenges of complex monitoring data,the uneven spatial distribution of deformation,and the construction and optimization of a prediction model for deformation prediction,a multipoint ultrahigh arch dam deformation prediction model,namely,the CEEMDAN-KPCA-GSWOA-KELM,which is based on a clustering partition,is pro-posed.First,the monitoring data are preprocessed via variational mode decomposition(VMD)and wavelet denoising(WT),which effectively filters out noise and improves the signal-to-noise ratio of the data,providing high-quality input data for subsequent prediction models.Second,scientific cluster partitioning is performed via the K-means++algorithm to precisely capture the spatial distribution characteristics of extra-high arch dams and ensure the consistency of deformation trends at measurement points within each partition.Finally,CEEMDAN is used to separate monitoring data,predict and analyze each component,combine the KPCA(Kernel Principal Component Analysis)and the KELM(Kernel Extreme Learning Machine)optimized by the GSWOA(Global Search Whale Optimization Algorithm),integrate the predictions of each component via reconstruction methods,and precisely predict the overall trend of ultrahigh arch dam deformation.An extra high arch dam project is taken as an example and validated via a comparative analysis of multiple models.The results show that the multipoint deformation prediction model in this paper can combine data from different measurement points,achieve a comprehensive,precise prediction of the deformation situation of extra high arch dams,and provide strong technical support for safe operation.
基金Supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),the Ministry of Health&Welfare,Republic of Korea(No.RS-2020-KH088726)the Patient-Centered Clinical Research Coordinating Center(PACEN),the Ministry of Health and Welfare,Republic of Korea(No.HC19C0276)the National Research Foundation of Korea(NRF),the Korea Government(MSIT)(No.RS-2023-00247504).
文摘AIM:To evaluate long-term visual field(VF)prediction using K-means clustering in patients with primary open angle glaucoma(POAG).METHODS:Patients who underwent 24-2 VF tests≥10 were included in this study.Using 52 total deviation values(TDVs)from the first 10 VF tests of the training dataset,VF points were clustered into several regions using the hierarchical ordered partitioning and collapsing hybrid(HOPACH)and K-means clustering.Based on the clustering results,a linear regression analysis was applied to each clustered region of the testing dataset to predict the TDVs of the 10th VF test.Three to nine VF tests were used to predict the 10th VF test,and the prediction errors(root mean square error,RMSE)of each clustering method and pointwise linear regression(PLR)were compared.RESULTS:The training group consisted of 228 patients(mean age,54.20±14.38y;123 males and 105 females),and the testing group included 81 patients(mean age,54.88±15.22y;43 males and 38 females).All subjects were diagnosed with POAG.Fifty-two VF points were clustered into 11 and nine regions using HOPACH and K-means clustering,respectively.K-means clustering had a lower prediction error than PLR when n=1:3 and 1:4(both P≤0.003).The prediction errors of K-means clustering were lower than those of HOPACH in all sections(n=1:4 to 1:9;all P≤0.011),except for n=1:3(P=0.680).PLR outperformed K-means clustering only when n=1:8 and 1:9(both P≤0.020).CONCLUSION:K-means clustering can predict longterm VF test results more accurately in patients with POAG with limited VF data.
文摘The increasing prevalence of multi-view data has made multi-view clustering a crucial technique for discovering latent structures from heterogeneous representations.However,traditional fuzzy clustering algorithms show limitations with the inherent uncertainty and imprecision of such data,as they rely on a single-dimensional membership value.To overcome these limitations,we propose an auto-weighted multi-view neutrosophic fuzzy clustering(AW-MVNFC)algorithm.Our method leverages the neutrosophic framework,an extension of fuzzy sets,to explicitly model imprecision and ambiguity through three membership degrees.The core novelty of AWMVNFC lies in a hierarchical weighting strategy that adaptively learns the contributions of both individual data views and the importance of each feature within a view.Through a unified objective function,AW-MVNFC jointly optimizes the neutrosophic membership assignments,cluster centers,and the distributions of view and feature weights.Comprehensive experiments conducted on synthetic and real-world datasets demonstrate that our algorithm achieves more accurate and stable clustering than existing methods,demonstrating its effectiveness in handling the complexities of multi-view data.
基金supported by the Deanship of Research at the King Fahd University of Petroleum&Minerals,Dhahran,31261,Saudi Arabia,under Project No.EC241001.
文摘Various factors,including weak tie-lines into the electric power system(EPS)networks,can lead to low-frequency oscillations(LFOs),which are considered an instant,non-threatening situation,but slow-acting and poisonous.Considering the challenge mentioned,this article proposes a clustering-based machine learning(ML)framework to enhance the stability of EPS networks by suppressing LFOs through real-time tuning of key power system stabilizer(PSS)parameters.To validate the proposed strategy,two distinct EPS networks are selected:the single-machine infinite-bus(SMIB)with a single-stage PSS and the unified power flow controller(UPFC)coordinated SMIB with a double-stage PSS.To generate data under various loading conditions for both networks,an efficient but offline meta-heuristic algorithm,namely the grey wolf optimizer(GWO),is used,with the loading conditions as inputs and the key PSS parameters as outputs.The generated loading conditions are then clustered using the fuzzy k-means(FKM)clustering method.Finally,the group method of data handling(GMDH)and long short-term memory(LSTM)ML models are developed for clustered data to predict PSS key parameters in real time for any loading condition.A few well-known statistical performance indices(SPI)are considered for validation and robustness of the training and testing procedure of the developed FKM-GMDH and FKM-LSTM models based on the prediction of PSS parameters.The performance of the ML models is also evaluated using three stability indices(i.e.,minimum damping ratio,eigenvalues,and time-domain simulations)after optimally tuned PSS with real-time estimated parameters under changing operating conditions.Besides,the outputs of the offline(GWO-based)metaheuristic model,proposed real-time(FKM-GMDH and FKM-LSTM)machine learning models,and previously reported literature models are compared.According to the results,the proposed methodology outperforms the others in enhancing the stability of the selected EPS networks by damping out the observed unwanted LFOs under various loading conditions.
基金supported by the National Natural Science Foundation of China (Grant Nos.62073087,U1911401,62071132,and 61973090)the Guangdong Key R&D Project of China (Grant No.2019B010121001)。
文摘Real-world data can often be represented in multiple forms and views,and analyzing data from different perspectives allows for more comprehensive learning of the data,resulting in better data clustering results.Non-negative matrix factorization(NMF)is used to solve the clustering problem to extract uniform discriminative low-dimensional features from multi-view data.Many clustering methods based on graph regularization have been proposed and proven to be effective,but ordinary graphs only consider pairwise relationships between samples.In order to learn the higher-order relationships that exist in the sample manifold and feature manifold of multi-view data,we propose a new semi-supervised multi-view clustering method called dual hypergraph regularized partially shared non-negative matrix factorization(DHPS-NMF).The complex manifold structure of samples and features is learned by constructing samples and feature hypergraphs.To improve the discrimination power of the obtained lowdimensional features,semi-supervised regression terms are incorporated into the model to effectively use the label information when capturing the complex manifold structure of the data.Ultimately,we conduct experiments on six real data sets and the results show that our algorithm achieves encouraging results in comparison with some methods.
基金supported in part by NUS startup grantthe National Natural Science Foundation of China (52076037)。
文摘Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.
基金Supported by the National Natural Science Foundation of China(No.52075306).
文摘In order to realize the intelligent mechanization of the last process of the fruit industry chains,the identification of fruit packing boxes is researched.A multi-view database is established to describe the omnidirectional attitudes of the fruit packing boxes.In order to reduce the data redundancy caused by multi-view acquisition,a new binary multi-view kernel principal component analysis network(BMKPCANet) is built,and a multi-view recognition method of fruit packing boxes is proposed based on the BMKPCANet and support vector machine(SVM).The experimental results show that the recognition accuracy of proposed BMKPCANet is 12.82% higher than PCANet and3.51% higher than KPCANet on average.The time consumption of proposed BMKPCANet is 7.74%lower than PCANet and 29.01% lower than KPCANet on average.This work has laid a theoretical foundation for multi-view recognition of 3 D objects and has a good practical application value.
基金Acknowledgment This work was supported by the National Natural Science Foundation of China(No.61976247)the Basic Ability Promotion Project of Guangxi Middle-Aged and Young University Teacher。
文摘As a class of effective methods for incomplete multi-view clustering,graph-based algorithms have recently drawn wide attention.However,most of them could use further improvement regarding the following aspects.First,in some graph-based models,all views are forced to share a common similarity graph regardless of the severe consistency degeneration due to incomplete views.Next,similarity graph construction and cluster analysis are sometimes performed separately.Finally,the contribution difference of individual views is not always carefully considered.To address these issues simultaneously,this paper proposes an incomplete multi-view clustering algorithm based on auto-weighted fusion in partition space.In our algorithm,the information of cluster structure is introduced into the process of similarity learning to construct a desirable similarity graph,information fusion is performed in partition space to alleviate the negative impact brought about by consistency degradation,and all views are adaptively weighted to reflect their different contributions to clustering tasks.Finally,all the subtasks are collaboratively optimized in a united framework to reach an overall optimal result.Experimental results show that the proposed method compares favorably with the state-of-the-art methods.
基金supported by National Natural Science Foundation of China(No.61806006)Priority Academic Program Development of Jiangsu Higher Education Institutions。
文摘The existing multi-view subspace clustering algorithms based on tensor singular value decomposition(t-SVD)predominantly utilize tensor nuclear norm to explore the intra view correlation between views of the same samples,while neglecting the correlation among the samples within different views.Moreover,the tensor nuclear norm is not fully considered as a convex approximation of the tensor rank function.Treating different singular values equally may result in suboptimal tensor representation.A hypergraph regularized multi-view subspace clustering algorithm with dual tensor log-determinant(HRMSC-DTL)was proposed.The algorithm used subspace learning in each view to learn a specific set of affinity matrices,and introduced a non-convex tensor log-determinant function to replace the tensor nuclear norm to better improve global low-rankness.It also introduced hyper-Laplacian regularization to preserve the local geometric structure embedded in the high-dimensional space.Furthermore,it rotated the original tensor and incorporated a dual tensor mechanism to fully exploit the intra view correlation of the original tensor and the inter view correlation of the rotated tensor.At the same time,an alternating direction of multipliers method(ADMM)was also designed to solve non-convex optimization model.Experimental evaluations on seven widely used datasets,along with comparisons to several state-of-the-art algorithms,demonstrated the superiority and effectiveness of the HRMSC-DTL algorithm in terms of clustering performance.
基金supported in part by the Youth Foundation of Shanxi Province(5113240053)the Fundamental Research Funds for the Central Universities(G2023KY05102)+2 种基金the Natural Science Foundation of China(61976120)the Natural Science Foundation of Jiangsu Province(BK20231337)the Natural Science Key Foundation of Jiangsu Education Department(21KJA510004)。
文摘It is challenging to cluster multi-view data in which the clusters have overlapping areas.Existing multi-view clustering methods often misclassify the indistinguishable objects in overlapping areas by forcing them into single clusters,increasing clustering errors.Our solution,the multi-view dynamic kernelized evidential clustering method(MvDKE),addresses this by assigning these objects to meta-clusters,a union of several related singleton clusters,effectively capturing the local imprecision in overlapping areas.MvDKE offers two main advantages:firstly,it significantly reduces computational complexity through a dynamic framework for evidential clustering,and secondly,it adeptly handles non-spherical data using kernel techniques within its objective function.Experiments on various datasets confirm MvDKE's superior ability to accurately characterize the local imprecision in multi-view non-spherical data,achieving better efficiency and outperforming existing methods in overall performance.
基金supported in part by the National Natural Science Foundation of China (No. 61572407)
文摘In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very important in big data mining and analysis. This calls for advanced techniques that consider the diversity of different views,while fusing these data. Multi-view Clustering(MvC) has attracted increasing attention in recent years by aiming to exploit complementary and consensus information across multiple views. This paper summarizes a large number of multi-view clustering algorithms, provides a taxonomy according to the mechanisms and principles involved, and classifies these algorithms into five categories, namely, co-training style algorithms, multi-kernel learning, multiview graph clustering, multi-view subspace clustering, and multi-task multi-view clustering. Therein, multi-view graph clustering is further categorized as graph-based, network-based, and spectral-based methods. Multi-view subspace clustering is further divided into subspace learning-based, and non-negative matrix factorization-based methods. This paper does not only introduce the mechanisms for each category of methods, but also gives a few examples for how these techniques are used. In addition, it lists some publically available multi-view datasets.Overall, this paper serves as an introductory text and survey for multi-view clustering.
基金supported by the Foundation of President of Hebei University(XZJJ202303).
文摘Federated learning is a machine learning framework designed to protect privacy by keeping training data on clients’devices without sharing private data.It trains a global model through collaboration between clients and the server.However,the presence of data heterogeneity can lead to inefficient model training and even reduce the final model’s accuracy and generalization capability.Meanwhile,data scarcity can result in suboptimal cluster distributions for few-shot clients in centralized clustering tasks,and standalone personalization tasks may cause severe overfitting issues.To address these limitations,we introduce a federated learning dual optimization model based on clustering and personalization strategy(FedCPS).FedCPS adopts a decentralized approach,where clients identify their cluster membership locally without relying on a centralized clustering algorithm.Building on this,FedCPS introduces personalized training tasks locally,adding a regularization term to control deviations between local and cluster models.This improves the generalization ability of the final model while mitigating overfitting.The use of weight-sharing techniques also reduces the computational cost of central machines.Experimental results on MNIST,FMNIST,CIFAR10,and CIFAR100 datasets demonstrate that our method achieves better personalization effects compared to other personalized federated learning methods,with an average test accuracy improvement of 0.81%–2.96%.Meanwhile,we adjusted the proportion of few-shot clients to evaluate the impact on accuracy across different methods.The experiments show that FedCPS reduces accuracy by only 0.2%–3.7%,compared to 2.1%–10%for existing methods.Our method demonstrates its advantages across diverse data environments.