满足本地差分隐私的混合噪音感知的模糊C均值聚类算法被引量：4

Fuzzy C-Means Clustering Algorithm Based on Mixed Noise-aware under Local Differential Privacy

下载PDF

导出

摘要在大数据和物联网应用中,本地差分隐私(LDP)技术用于保护聚类分析中的用户隐私,但现有方法要么在LDP下交互式地进行聚类,需要消耗大量隐私预算,要么没有同时考虑到聚类数据中蕴含的表示数据质量的高斯噪音以及为满足LDP保护的拉普拉斯噪音,致使聚类精度低下。同时,对于衡量用户提交数据和簇心之间的距离选择较为武断,没有充分利用到用户提交的噪音数据中蕴含的噪音模式。为此,该文创新性地提出一种满足LDP的混合噪音感知的模糊C均值聚类算法(mnFCM),该算法的主要思想是同时建模用户上传数据中蕴含的表示用户质量的高斯噪音以及为保护用户数据注入的拉普拉斯噪音,进而设计出混合噪音感知的距离替代传统的欧式距离,来衡量样本数据与簇心间的相似性。特别地,在mnFCM中,该文首先设计了混合噪音感知的距离计算方法,在此基础上给出算法新的目标函数,并基于拉格朗日乘子法设计了求解方法,最后理论上分析了求解算法的收敛性。该文进一步理论分析了mnFCM的隐私、效用和复杂度,分析结果表明所提算法严格满足LDP、相对于对比算法更接近非隐私下的簇心以及和非隐私算法具有接近的复杂度。在两个真实数据集上的实验结果表明,mnFCM在满足LDP下,聚类精度提高了10%~15%。 Objective In big data and Internet of Things(IoT)applications,clustering analysis of collected data is crucial for enhancing user experience.To mitigate privacy risks from using raw data directly,Local Differential Privacy(LDP)techniques are often employed.However,existing LDP clustering studies either require interactive execution,consuming significant privacy budgets,or fail to balance Gaussian noise in clustering data with Laplacian noise for LDP protection,resulting in low clustering accuracy.Moreover,distance metrics for similarity measurement are chosen arbitrarily without fully utilizing the noise characteristics of user-submitted noisy data.This study designs a hybrid noise-aware distance calculation method integrated into the fuzzy C-means clustering algorithm,effectively reducing noise impact on clustering results while protecting data privacy,ensuring both privacy security and clustering quality.It provides a robust solution for sensitive information processing in high-dimensional data environments.Methods This paper innovatively proposes a mixed noise-aware Fuzzy C-Means clustering algorithm(mnFCM)under LDP.The core idea is to model both Gaussian noise(representing data quality)and Laplacian noise(for data protection)in uploaded user data by constructing a more accurate mixed distribution model,and design a mixed noise-aware distance to replace Euclidean distance for measuring similarity between samples and cluster centers.Specifically,in mnFCM,this paper first designs a mixed noise-aware distance calculation method.On this basis,a new objective function for the algorithm is proposed,and a solution method is designed based on the Lagrange multiplier method.Finally,the convergence of the solution algorithm is theoretically analyzed.Results and Discussions The experimental results show that as the privacy budgetεincreases,the performance of various clustering algorithms generally improves.Notably,mnFCM achieves at least a 8.5%improvement in accuracy compared to the state-of-the-art PrivPro algorithm(Fig.1).This is because mnFCM innovatively considers both Gaussian noise(reflecting data quality)and Laplacian noise(for LDP protection),designing a hybrid noise-aware distance metric to enhance sample similarity measurement,thereby effectively protecting privacy while balancing clustering performance.Experiments on the fuzziness parameter m reveal that when m=2,all algorithms reach peak F-Measure values and lowest Entropy values(Fig.2),strongly validating m=2 as the optimal balance point for clustering effectiveness.Additionally,running time of mnFCM is 1.0 to 1.4 times that of the non-privacy-preserving Nopriv algorithm(Table 2),due to its refined noise processing mechanism.Ablation experiments demonstrate that the MixDis scheme achieves the best clustering performance on both NG and UW datasets(Fig.4),as it considers both Laplacian and Gaussian noise,making the clustered data more robust.Comparative analysis on the synthetic dataset Syn with other privacy-preserving clustering algorithms shows that DP-DPCL+consistently outperforms DP-DPCL,and DPC+consistently outperforms DPC(Fig.5).In addition,by varying the values of the four adjustable parameters-privacy budgetε,sample size N,dimension K,and cluster number C-it is evident that the mnFCM method outperforms other privacy protection schemes(Fig.6).Conclusions This paper addresses the privacy protection issue in fuzzy clustering algorithms by simultaneously considering Gaussian noise(reflecting data quality)and Laplacian noise(for LDP protection),and innovatively proposes a mixed noise-aware fuzzy C-means clustering algorithm,mnFCM,satisfying LDP to balance privacy security and clustering quality.It designs a mixed noise-aware distance calculation method,formulates a new objective function,and solves it using the Lagrange multiplier method,while theoretically analyzing the algorithm’s convergence.Theoretical analysis shows that the algorithm strictly satisfies LDP,is closer to non-private cluster centroids compared to baseline algorithms,and has similar complexity to non-private algorithms.Experiments demonstrate that the algorithm improves clustering accuracy by 10%~15%on real datasets compared to baseline privacy-preserving algorithms.However,a limitation of this study is that the privacy budget calculation for Laplacian noise in the mixed noise setting may be influenced by Gaussian noise.In future research,the adaptive noise proportion allocation strategies,such as dynamically adjusting the weights of Gaussian/Laplacian noise,will be further explored to optimize the privacy-utility trade-off.

作者张朋飞程俊张治坤方贤进孙笠王杰姜茸 ZHANG Pengfei;CHENG Jun;ZHANG Zhikun;FANG Xianjin;SUN Li;WANG Jie;JIANG Rong(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China;Key Laboratory of Service Computing,Yunnan University of Finance and Economics,Kunming 650221,China;College of Computer Science and Technology,Zhejiang University,Hangzhou 310058,China;School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China;School of Safety Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China)

机构地区安徽理工大学计算机科学与工程学院云南省服务计算重点实验室(云南财经大学) 浙江大学计算机科学与技术学院华北电力大学控制与计算机工程学院安徽理工大学安全科学与工程学院

出处《电子与信息学报》北大核心 2025年第3期739-757,共19页 Journal of Electronics & Information Technology

基金安徽理工大学高层次引进人才科研启动基金(2023yjrc92) 云南省服务计算重点实验室开放课题(YNSC24116) 国家自然科学基金(62202164).

关键词聚类分析隐私保护本地差分隐私模糊C均值聚类拉普拉斯机制 Clustering analysis Privacy protection Local Differential Privacy(LDP) Fuzzy C-Means(FCM)clustering Laplace mechanism

分类号 TN911 [电子电信—通信与信息系统] TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1常璐瑶,牛新征,罗涛,钱早国.基于子博弈完美均衡的启发式聚类算法[J].电子学报,2024,52(3):740-750. 被引量：1
2黄鹤,李文龙,杨澜,王会峰,高涛,陈婷.跳跃跟踪SSA交叉迭代AP聚类算法[J].电子学报,2024,52(3):977-990. 被引量：5
3张强,叶阿勇,叶帼华,邓慧娜,陈爱民.最优聚类的k-匿名数据隐私保护机制[J].计算机研究与发展,2022,59(7):1625-1635. 被引量：13
4傅培旺,丁红发,刘海,蒋合领,唐明丽,于莹莹.基于本地差分隐私的分布式图统计采集算法[J].计算机研究与发展,2024,61(7):1643-1669. 被引量：2
5李宗维,孔德潮,牛媛争,彭红利,李晓琦,李文凯.基于人工智能和区块链融合的隐私保护技术研究综述[J].信息安全研究,2023,9(6):557-565. 被引量：15
6张少波,原刘杰,毛新军,朱更明.基于本地差分隐私的K-modes聚类数据隐私保护方法[J].电子学报,2022,50(9):2181-2188. 被引量：17
7张国鹏,陈学斌,王豪石,翟冉,马征.面向本地差分隐私的K-Prototypes聚类方法[J].计算机应用,2022,42(12):3813-3821. 被引量：9
8石江南,彭长根,谭伟杰.Spark框架下支持差分隐私保护的K-means++聚类方法[J].信息安全研究,2024,10(8):712-718. 被引量：6
9曾卓,汪成亮,马飞.基于差分隐私的活动模式保护与时空轨迹发布方法[J].电子学报,2023,51(3):552-563. 被引量：3
10徐久成,侯钦臣,瞿康林,孙元豪,孟祥茹.面向时间序列的鲁棒性半监督模糊C均值聚类[J].计算机工程与应用,2023,59(8):73-80. 被引量：6

二级参考文献43

1王智慧,许俭,汪卫,施伯乐.一种基于聚类的数据匿名方法[J].软件学报,2010,21(4):680-693. 被引量：50
2李仁侃,叶东毅.粗糙K-Modes聚类算法[J].计算机应用,2011,31(1):97-100. 被引量：6
3李杨,郝志峰,温雯,谢光强.差分隐私保护k-means聚类方法研究[J].计算机科学,2013,40(3):287-290. 被引量：50
4熊平,朱天清,王晓峰.差分隐私保护及其应用[J].计算机学报,2014,37(1):101-122. 被引量：184
5赵小强,谢亚萍.基于差分改进的仿射传播聚类算法[J].控制工程,2018,25(12):2115-2119. 被引量：3
6何清,庄福振,曾立,赵卫中,谭庆.PDMiner:基于云计算的并行分布式数据挖掘工具平台[J].中国科学：信息科学,2014,44(7):871-885. 被引量：28
7谢娟英,高红超,谢维信.K近邻优化的密度峰值快速搜索聚类算法[J].中国科学：信息科学,2016,46(2):258-280. 被引量：109
8李洪成,吴晓平,陈燕.MapReduce框架下支持差分隐私保护的k-means聚类方法[J].通信学报,2016,37(2):124-130. 被引量：25
9刘晓迁,李千目.基于聚类匿名化的差分隐私保护数据发布方法[J].通信学报,2016,37(5):125-129. 被引量：21
10姜火文,曾国荪,马海英.面向表数据发布隐私保护的贪心聚类匿名方法[J].软件学报,2017,28(2):341-351. 被引量：28

共引文献68

1任晓旭,仇超,邓辉,戴子明,刘泽军,王晓飞.边缘智能融合区块链:研究现状、应用及挑战[J].信息与控制,2024,53(1):1-16. 被引量：8
2李啸林,章红艳,许佳钰,许力,黄赞.基于节点1-邻居图相似性的社会网络匿名技术[J].计算机系统应用,2022,31(11):21-30.
3蒋浩英,钱进,王滔滔,洪承鑫,余鹰.基于三支决策的新型分类匿名模型[J].南京大学学报（自然科学版）,2023,59(6):970-980. 被引量：3
4衲钦,张慧春.数智环境下匿名数据治理创新对策研究[J].科学管理研究,2022,40(2):124-130. 被引量：11
5王涛,谭虎,徐亭亭,辛保江,刘刚,周潘.基于迭代二分聚类的K-匿名机制[J].信息安全研究,2023,9(5):402-411.
6翟冉,陈学斌,张国鹏,裴浪涛,马征.基于不同敏感度的改进K-匿名隐私保护算法[J].计算机应用,2023,43(5):1497-1503. 被引量：4
7张星,张兴,王晴阳.DP-IMKP:满足个性化差分隐私的数据发布保护方法[J].计算机工程与应用,2023,59(10):288-298. 被引量：10
8姚崇兵,姚国章.基于动态奖惩机制下数据交易平台隐私监管的演化博弈研究[J].生产力研究,2023(7):23-30.
9曾迎春,曾玲晖.健康老龄化视域下智慧康养元宇宙的应用现状、挑战与对策[J].护理学报,2023,30(14):70-73. 被引量：13
10史伟,王园园,李刚,张兴.基于KFCMSA的(k,l)加权社交网络匿名算法[J].计算机应用研究,2023,40(10):3149-3154. 被引量：1

同被引文献38

1王荣生,张云.汽车电动尾门开关耐久性试验研究[J].大众汽车,2023(10):61-63. 被引量：1
2田守富,张田田.一类自伴随Lubrication方程的对称,守恒律,拉格朗日函数和精确解[J].应用数学学报,2022,45(1):132-144. 被引量：5
3李宝鹏,高伟亮,王守权,李大龙,王永坤.舰载机多雷达传感器任务分配与采样间隔融合优化算法[J].控制与决策,2022,37(3):565-573. 被引量：2
4郁杰,许艳霞,王文梅.基于人工智能技术的煤矿机电设备状态识别研究[J].煤炭技术,2022,41(4):143-146. 被引量：28
5张笑华,肖兴勇,方圣恩.面向桥梁结构健康监测的压缩感知动力响应信号重构[J].振动工程学报,2022,35(3):699-706. 被引量：27
6袁雪峰,马成龙,陈世和.基于GMM与NSET优化算法的设备参数预警研究[J].控制工程,2022,29(6):1058-1064. 被引量：5
7张祎磊,边家文,丁开华,冉佳诺,刘文平.基于改进的经验模态分解和SSA联合算法的GPS坐标时间序列重构[J].大地测量与地球动力学,2022,42(9):904-909. 被引量：6
8王一茗,马振华,杨萌祺,董欣,曾思育.机理和数据混合驱动的排水系统控制模型构建方法[J].环境工程,2022,40(6):204-211. 被引量：11
9秦思远,李峥峰.风轮不平衡的风电机组机械振动信号频域特性分析[J].可再生能源,2022,40(9):1202-1208. 被引量：10
10贾鹤鸣,张棕淇,姜子超,冯榆淇.基于混合身份搜索黏菌优化的模糊C-均值聚类算法[J].智能系统学报,2022,17(5):999-1011. 被引量：17

引证文献4

1米彦军,徐红亮,张侯,刘晓伟,吴雪刚.基于信号重构与模糊聚类的矿井主排水泵电机故障监测[J].国外电子测量技术,2025,44(4):154-159.
2党铮铮,王超,赵宁.大语言模型中的差分隐私保护研究[J].国外电子测量技术,2025,44(6):122-128.
3朱文杰.基于模糊C均值聚类的电动尾门防夹控制研究[J].汽车知识,2025,25(10):113-115.
4李宵,张晓峰.声音监测技术在健美操教学运动强度实时评估中的应用[J].电声技术,2025,49(10):84-86.

1薛建明.让宣讲更生动、更真实、更有趣、更精彩[J].关爱明天,2025(3):52-52.
2周佳怡,孙凯旋,王晓丽,阳春华,邹美吟.磨矿分级过程控制知识在线提取与更新策略[J].控制理论与应用,2025,42(2):217-225.
3孙萌.从“拒绝”出发,支持幼儿双向互动[J].学前教育,2025(3):90-90.
4高鹏波,霍海燕,苏文平,王升元,张大亮.IT系统软件用户质量体验评测模型研究与应用[J].通信与信息技术,2025(1):109-113.
5张亚磊,宁增恩,李建.养血消痛方联合甲钴胺片治疗腰椎间盘突出症的效果观察[J].河南外科学杂志,2025,31(2):15-19. 被引量：1
6王旭.论民用航空器机长的权力行使——基于“合理性”标准的分析[J].京师法学,2023(1):245-270.
7龚金平.在起伏的人生际遇中到底意难平--《我的叔叔于勒》重读[J].语文月刊,2025(3):66-70.

电子与信息学报

2025年第3期

浏览历史

内容加载中请稍等...

满足本地差分隐私的混合噪音感知的模糊C均值聚类算法被引量：4

参考文献11

二级参考文献43

共引文献68

同被引文献38

引证文献4

相关作者

相关机构

相关主题

浏览历史

满足本地差分隐私的混合噪音感知的模糊C均值聚类算法 被引量：4

参考文献11

二级参考文献43

共引文献68

同被引文献38

引证文献4

相关作者

相关机构

相关主题

浏览历史

满足本地差分隐私的混合噪音感知的模糊C均值聚类算法被引量：4