With the increasing complexity of malware attack techniques,traditional detection methods face significant challenges,such as privacy preservation,data heterogeneity,and lacking category information.To address these i...With the increasing complexity of malware attack techniques,traditional detection methods face significant challenges,such as privacy preservation,data heterogeneity,and lacking category information.To address these issues,we propose Federated Dynamic Prototype Learning(FedDPL)for malware classification by integrating Federated Learning with a specifically designed K-means.Under the Federated Learning framework,model training occurs locally without data sharing,effectively protecting user data privacy and preventing the leakage of sensitive information.Furthermore,to tackle the challenges of data heterogeneity and the lack of category information,FedDPL introduces a dynamic prototype learning mechanism,which adaptively adjusts the clustering prototypes in terms of position and number.Thus,the dependency on predefined category numbers in typical K-means and its variants can be significantly reduced,resulting in improved clustering performance.Theoretically,it provides a more accurate detection of malicious behavior.Experimental results confirm that FedDPL excels in handling malware classification tasks,demonstrating superior accuracy,robustness,and privacy protection.展开更多
Towards optimal k-prototype discovery,k-means-like algorithms give us inspirations of central samples collection,yet the unstable seed samples selection,the hypothesis of a circle-like pattern,and the unknown K are st...Towards optimal k-prototype discovery,k-means-like algorithms give us inspirations of central samples collection,yet the unstable seed samples selection,the hypothesis of a circle-like pattern,and the unknown K are still challenges,particularly for non-predetermined data patterns.We propose an adaptive k-prototype clustering method(kProtoClust)which launches cluster exploration with a sketchy division of K clusters and finds evidence for splitting and merging.On behalf of a group of data samples,support vectors and outliers from the perspective of support vector data description are not the appropriate candidates for prototypes,while inner samples become the first candidates for instability reduction of seeds.Different from the representation of samples in traditional,we extend sample selection by encouraging fictitious samples to emphasize the representativeness of patterns.To get out of the circle-like pattern limitation,we introduce a convex decomposition-based strategy of one-cluster-multiple-prototypes in which convex hulls of varying sizes are prototypes,and accurate connection analysis makes the support of arbitrary cluster shapes possible.Inspired by geometry,the three presented strategies make kProtoClust bypassing the K dependence well with the global and local position relationship analysis for data samples.Experimental results on twelve datasets of irregular cluster shape or high dimension suggest that kProtoClust handles arbitrary cluster shapes with prominent accuracy even without the prior knowledge K.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62162009the Key Technologies R&D Program of He’nan Province under Grant No.242102211065+2 种基金the Postgraduate Education Reform and Quality Improvement Project of Henan Province under Grant Nos.YJS2025GZZ36,YJS2024AL112,and YJS2024JD38the Innovation Scientists and Technicians Troop Construction Projects of Henan Province under Grant No.CXTD2017099the Scientific Research Innovation Team of Xuchang University under Grant No.2022CXTD003.
文摘With the increasing complexity of malware attack techniques,traditional detection methods face significant challenges,such as privacy preservation,data heterogeneity,and lacking category information.To address these issues,we propose Federated Dynamic Prototype Learning(FedDPL)for malware classification by integrating Federated Learning with a specifically designed K-means.Under the Federated Learning framework,model training occurs locally without data sharing,effectively protecting user data privacy and preventing the leakage of sensitive information.Furthermore,to tackle the challenges of data heterogeneity and the lack of category information,FedDPL introduces a dynamic prototype learning mechanism,which adaptively adjusts the clustering prototypes in terms of position and number.Thus,the dependency on predefined category numbers in typical K-means and its variants can be significantly reduced,resulting in improved clustering performance.Theoretically,it provides a more accurate detection of malicious behavior.Experimental results confirm that FedDPL excels in handling malware classification tasks,demonstrating superior accuracy,robustness,and privacy protection.
基金supported by the National Natural Science Foundation of China under Grant No.62162009the Key Technologies R&D Program of He’nan Province under Grant No.242102211065+1 种基金the Scientific Research Innovation Team of Xuchang University under GrantNo.2022CXTD003Postgraduate Education Reform and Quality Improvement Project of Henan Province under Grant No.YJS2024JD38.
文摘Towards optimal k-prototype discovery,k-means-like algorithms give us inspirations of central samples collection,yet the unstable seed samples selection,the hypothesis of a circle-like pattern,and the unknown K are still challenges,particularly for non-predetermined data patterns.We propose an adaptive k-prototype clustering method(kProtoClust)which launches cluster exploration with a sketchy division of K clusters and finds evidence for splitting and merging.On behalf of a group of data samples,support vectors and outliers from the perspective of support vector data description are not the appropriate candidates for prototypes,while inner samples become the first candidates for instability reduction of seeds.Different from the representation of samples in traditional,we extend sample selection by encouraging fictitious samples to emphasize the representativeness of patterns.To get out of the circle-like pattern limitation,we introduce a convex decomposition-based strategy of one-cluster-multiple-prototypes in which convex hulls of varying sizes are prototypes,and accurate connection analysis makes the support of arbitrary cluster shapes possible.Inspired by geometry,the three presented strategies make kProtoClust bypassing the K dependence well with the global and local position relationship analysis for data samples.Experimental results on twelve datasets of irregular cluster shape or high dimension suggest that kProtoClust handles arbitrary cluster shapes with prominent accuracy even without the prior knowledge K.