Federated Learning(FL)has become a leading decentralized solution that enables multiple clients to train a model in a collaborative environment without directly sharing raw data,making it suitable for privacy-sensitiv...Federated Learning(FL)has become a leading decentralized solution that enables multiple clients to train a model in a collaborative environment without directly sharing raw data,making it suitable for privacy-sensitive applications such as healthcare,finance,and smart systems.As the field continues to evolve,the research field has become more complex and scattered,covering different system designs,training methods,and privacy techniques.This survey is organized around the three core challenges:how the data is distributed,how models are synchronized,and how to defend against attacks.It provides a structured and up-to-date review of FL research from 2023 to 2025,offering a unified taxonomy that categorizes works by data distribution(Horizontal FL,Vertical FL,Federated Transfer Learning,and Personalized FL),training synchronization(synchronous and asynchronous FL),optimization strategies,and threat models(data leakage and poisoning attacks).In particular,we summarize the latest contributions in Vertical FL frameworks for secure multi-party learning,communication-efficient Horizontal FL,and domain-adaptive Federated Transfer Learning.Furthermore,we examine synchronization techniques addressing system heterogeneity,including straggler mitigation in synchronous FL and staleness management in asynchronous FL.The survey covers security threats in FL,such as gradient inversion,membership inference,and poisoning attacks,as well as their defense strategies that include privacy-preserving aggregation and anomaly detection.The paper concludes by outlining unresolved issues and highlighting challenges in handling personalized models,scalability,and real-world adoption.展开更多
The medical industry generates vast amounts of data suitable for machine learning during patient-clinician interaction in hospitals.However,as a result of data protection regulations like the general data protection r...The medical industry generates vast amounts of data suitable for machine learning during patient-clinician interaction in hospitals.However,as a result of data protection regulations like the general data protection regulation(GDPR),patient data cannot be shared freely across institutions.In these cases,federated learning(FL)is a viable option where a global model learns from multiple data sites without moving the data.In this paper,we focused on random forests(RFs)for its effectiveness in classification tasks and widespread use throughout the medical industry and compared two popular federated random forest aggregation algorithms on horizontally partitioned data.We first provided necessary background information on federated learning,the advantages of random forests in a medical context,and the two aggregation algorithms.A series of extensive experiments using four public binary medical datasets(an excerpt of MIMIC III,Pima Indian diabetes dataset from Kaggle,and diabetic retinopathy and heart failure dataset from UCI machine learning repository)were then performed to systematically compare the two on equal-sized,unequal-sized,and class-imbalanced clients.A follow-up investigation on the effects of more clients was also conducted.We finally empirically analyzed the advantages of federated learning and concluded that the weighted merge algorithm produces models with,on average,1.903%higher F1 score and 1.406%higher AUCROC value.展开更多
文摘Federated Learning(FL)has become a leading decentralized solution that enables multiple clients to train a model in a collaborative environment without directly sharing raw data,making it suitable for privacy-sensitive applications such as healthcare,finance,and smart systems.As the field continues to evolve,the research field has become more complex and scattered,covering different system designs,training methods,and privacy techniques.This survey is organized around the three core challenges:how the data is distributed,how models are synchronized,and how to defend against attacks.It provides a structured and up-to-date review of FL research from 2023 to 2025,offering a unified taxonomy that categorizes works by data distribution(Horizontal FL,Vertical FL,Federated Transfer Learning,and Personalized FL),training synchronization(synchronous and asynchronous FL),optimization strategies,and threat models(data leakage and poisoning attacks).In particular,we summarize the latest contributions in Vertical FL frameworks for secure multi-party learning,communication-efficient Horizontal FL,and domain-adaptive Federated Transfer Learning.Furthermore,we examine synchronization techniques addressing system heterogeneity,including straggler mitigation in synchronous FL and staleness management in asynchronous FL.The survey covers security threats in FL,such as gradient inversion,membership inference,and poisoning attacks,as well as their defense strategies that include privacy-preserving aggregation and anomaly detection.The paper concludes by outlining unresolved issues and highlighting challenges in handling personalized models,scalability,and real-world adoption.
文摘The medical industry generates vast amounts of data suitable for machine learning during patient-clinician interaction in hospitals.However,as a result of data protection regulations like the general data protection regulation(GDPR),patient data cannot be shared freely across institutions.In these cases,federated learning(FL)is a viable option where a global model learns from multiple data sites without moving the data.In this paper,we focused on random forests(RFs)for its effectiveness in classification tasks and widespread use throughout the medical industry and compared two popular federated random forest aggregation algorithms on horizontally partitioned data.We first provided necessary background information on federated learning,the advantages of random forests in a medical context,and the two aggregation algorithms.A series of extensive experiments using four public binary medical datasets(an excerpt of MIMIC III,Pima Indian diabetes dataset from Kaggle,and diabetic retinopathy and heart failure dataset from UCI machine learning repository)were then performed to systematically compare the two on equal-sized,unequal-sized,and class-imbalanced clients.A follow-up investigation on the effects of more clients was also conducted.We finally empirically analyzed the advantages of federated learning and concluded that the weighted merge algorithm produces models with,on average,1.903%higher F1 score and 1.406%higher AUCROC value.