摘要
联邦学习是一种分布式机器学习方法,它使多个设备或节点能够协作训练模型,同时保持数据的本地性。但由于联邦学习是由不同方拥有的数据集进行模型训练,敏感数据可能会被泄露。为了改善上述问题,已有相关工作在联邦学习中应用差分隐私对梯度数据添加噪声。然而在采用了相应的隐私技术来降低敏感数据泄露风险的同时,模型精度和效果因为噪声大小的不同也受到了部分影响。为解决此问题,该文提出一种自适应聚类中心个数选择机制(DP-Fed-Adap),根据训练轮次和梯度的变化动态地改变聚类中心个数,使模型可以在保持相同性能水平的同时确保对敏感数据的保护。实验表明,在使用相同的隐私预算前提下DP-Fed-Adap与添加了差分隐私的联邦相似算法(FedSim)和联邦平均算法(FedAvg)相比,具有更好的模型性能和隐私保护效果。
Objective Differential privacy,based on strict statistical models,is widely applied in federated learning.The common approach integrates privacy protection by perturbing parameters during local model training and global model aggregation to safeguard user privacy while maintaining model performance.A key challenge is minimizing performance degradation while ensuring strong privacy protection.Currently,an issue arises in early-stage training,where data gradient directions are highly dispersed.Directly applying initial data calculations and processing at this stage can reduce the accuracy of the global model.Methods To address this issue,this study introduces a differential privacy mechanism in federated learning to protect individual privacy while clustering gradient information from multiple data owners.During gradient clustering,the number of clustering centers is dynamically adjusted based on training epochs,with the rate of change in clusters aligned with the model training process.In the early stages,higher noise levels are introduced to enhance privacy protection.As the model converges,noise is gradually reduced to improve learning of the true data distribution.Result and discussions The first set of experimental results(Fig.3)shows that different fixed numbers of cluster centers lead to varying rates of change in training accuracy during the early and late stages of the training cycle.This suggests that reducing the number of cluster centers as training progresses benefits model performance,and the segmentation function is selected based on these findings.The second set of experiments(Fig.4)indicates that among four sets of model performance comparisons,our method achieves the highest accuracy in the later stages of training as the number of rounds increases.This demonstrates that adjusting the number of cluster centers during training has a measurable effect.As model training concludes,gradient directions tend to converge,and reducing the number of cluster centers improves accuracy.The performance comparison of the three models(Table 2)further shows that our proposed method outperforms others in most cases.Conclusions Comparative experiments on four publicly available datasets demonstrate that the proposed algorithm outperforms baseline methods in model performance after incorporating adaptive clustering center selection.Additionally,it ensures privacy protection for sensitive data while maintaining a more stable training process.The improved clustering strategy better aligns with the actual training dynamics,validating the effectiveness of this approach.
作者
宁博
宁一鸣
杨超
周新
李冠宇
马茜
NING Bo;NING Yi ming;YANG Chao;ZHOU Xin;LI Guan yu;MA Qian(School of Information Science and Technology,Dalian Maritime University,Dalian 116026,China;Information and Communication Branch of State Grid Liaoning Electric Power Co.,Ltd.,Shenyang 110000,China)
出处
《电子与信息学报》
北大核心
2025年第2期519-529,共11页
Journal of Electronics & Information Technology
基金
国家自然科学基金(61976032,62002039)。
关键词
联邦学习
差分隐私保护
梯度聚类
自适应选择
Federated Learning(FL)
Differential privacy protection
Gradient clustering
Adaptive selection