Heterogeneous federated learning(HtFL)has gained significant attention due to its ability to accommodate diverse models and data from distributed combat units.The prototype-based HtFL methods were proposed to reduce t...Heterogeneous federated learning(HtFL)has gained significant attention due to its ability to accommodate diverse models and data from distributed combat units.The prototype-based HtFL methods were proposed to reduce the high communication cost of transmitting model parameters.These methods allow for the sharing of only class representatives between heterogeneous clients while maintaining privacy.However,existing prototype learning approaches fail to take the data distribution of clients into consideration,which results in suboptimal global prototype learning and insufficient client model personalization capabilities.To address these issues,we propose a fair trainable prototype federated learning(FedFTP)algorithm,which employs a fair sampling training prototype(FSTP)mechanism and a hyperbolic space constraints(HSC)mechanism to enhance the fairness and effectiveness of prototype learning on the server in heterogeneous environments.Furthermore,a local prototype stable update(LPSU)mechanism is proposed as a means of maintaining personalization while promoting global consistency,based on contrastive learning.Comprehensive experimental results demonstrate that FedFTP achieves state-of-the-art performance in HtFL scenarios.展开更多
As the integration of medical big data and artificial intelligence advances,the secure sharing of medical data has become a key driving force for advancing disease research and clinical diagnosis.Federated learning,a ...As the integration of medical big data and artificial intelligence advances,the secure sharing of medical data has become a key driving force for advancing disease research and clinical diagnosis.Federated learning,a distributed approach enabling collaborative data processing without sharing raw data,offers promising solutions to challenges in multi-center medical data sharing.This review summarizes the progress of federated learning in multi-center medical data processing,analyzed from four perspectives:system architectures,data distribution strategies,clinical tasks,and algorithmic models.At the same time,this paper explores the challenges in practical applications,such as data heterogeneity,communication overhead,and privacy concerns.It proposes driving future research development by optimizing algorithms,strengthening privacy protection mechanisms,and enhancing computational efficiency.展开更多
Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving dat...Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving data privacy and avoiding direct data sharing.Despite its privacy-preserving advantages,FL remains vulnerable to backdoor attacks,where malicious participants introduce backdoors into local models that are then propagated to the global model through the aggregation process.While existing differential privacy defenses have demonstrated effectiveness against backdoor attacks in FL,they often incur a significant degradation in the performance of the aggregated models on benign tasks.To address this limitation,we propose a novel backdoor defense mechanism based on differential privacy.Our approach first utilizes the inherent out-of-distribution characteristics of backdoor samples to identify and exclude malicious model updates that significantly deviate from benign models.By filtering out models that are clearly backdoor-infected before applying differential privacy,our method reduces the required noise level for differential privacy,thereby enhancing model robustness while preserving performance.Experimental evaluations on the CIFAR10 and FEMNIST datasets demonstrate that our method effectively limits the backdoor accuracy to below 15%across various backdoor scenarios while maintaining high main task accuracy.展开更多
Data sharing technology in Internet of Vehicles(Io V)has attracted great research interest with the goal of realizing intelligent transportation and traffic management.Meanwhile,the main concerns have been raised abou...Data sharing technology in Internet of Vehicles(Io V)has attracted great research interest with the goal of realizing intelligent transportation and traffic management.Meanwhile,the main concerns have been raised about the security and privacy of vehicle data.The mobility and real-time characteristics of vehicle data make data sharing more difficult in Io V.The emergence of blockchain and federated learning brings new directions.In this paper,a data-sharing model that combines blockchain and federated learning is proposed to solve the security and privacy problems of data sharing in Io V.First,we use federated learning to share data instead of exposing actual data and propose an adaptive differential privacy scheme to further balance the privacy and availability of data.Then,we integrate the verification scheme into the consensus process,so that the consensus computation can filter out low-quality models.Experimental data shows that our data-sharing model can better balance the relationship between data availability and privacy,and also has enhanced security.展开更多
The martensite start temperature is a critical parameter for steels with metastable austenite.Although numerous models have been developed to predict the martensite start(Ms)temperature,the complexity of the martensit...The martensite start temperature is a critical parameter for steels with metastable austenite.Although numerous models have been developed to predict the martensite start(Ms)temperature,the complexity of the martensitic transformation greatly limits their performance and extensibility.In this work,we apply deep data mining of thermodynamic calculations and deep learning to develop a generic model for Msprediction.Deep data mining was used to establish a hierarchical database with three levels of information.Then,a convolutional neural network model,which can accurately treat the hierarchical data structure,was used to obtain the final model.By integrating thermodynamic calculations,traditional machine learning and deep learning modeling,the final predictor model shows excellent generalizability and extensibility,i.e.model performance both within and beyond the composition range of the original database.The effects of 15 alloying elements were considered successfully using the proposed methodology.The work suggests that,with the help of deep data mining considering the physical mechanisms,deep learning methods can partially mitigate the challenge with limited data in materials science and provide a means for solving complex problems with small databases.展开更多
The development of data-driven artificial intelligence technology has given birth to a variety of big data applications.Data has become an essential factor to improve these applications.Federated learning,a privacy-pr...The development of data-driven artificial intelligence technology has given birth to a variety of big data applications.Data has become an essential factor to improve these applications.Federated learning,a privacy-preserving machine learning method,is proposed to leverage data from different data owners.It is typically used in conjunction with cryptographic methods,in which data owners train the global model by sharing encrypted model updates.However,data encryption makes it difficult to identify the quality of these model updates.Malicious data owners may launch attacks such as data poisoning and free-riding.To defend against such attacks,it is necessary to find an approach to audit encrypted model updates.In this paper,we propose a blockchain-based audit approach for encrypted gradients.It uses a behavior chain to record the encrypted gradients from data owners,and an audit chain to evaluate the gradients’quality.Specifically,we propose a privacy-preserving homomorphic noise mechanism in which the noise of each gradient sums to zero after aggregation,ensuring the availability of aggregated gradient.In addition,we design a joint audit algorithm that can locate malicious data owners without decrypting individual gradients.Through security analysis and experimental evaluation,we demonstrate that our approach can defend against malicious gradient attacks in federated learning.展开更多
As the volume of healthcare and medical data increases from diverse sources,real-world scenarios involving data sharing and collaboration have certain challenges,including the risk of privacy leakage,difficulty in dat...As the volume of healthcare and medical data increases from diverse sources,real-world scenarios involving data sharing and collaboration have certain challenges,including the risk of privacy leakage,difficulty in data fusion,low reliability of data storage,low effectiveness of data sharing,etc.To guarantee the service quality of data collaboration,this paper presents a privacy-preserving Healthcare and Medical Data Collaboration Service System combining Blockchain with Federated Learning,termed FL-HMChain.This system is composed of three layers:Data extraction and storage,data management,and data application.Focusing on healthcare and medical data,a healthcare and medical blockchain is constructed to realize data storage,transfer,processing,and access with security,real-time,reliability,and integrity.An improved master node selection consensus mechanism is presented to detect and prevent dishonest behavior,ensuring the overall reliability and trustworthiness of the collaborative model training process.Furthermore,healthcare and medical data collaboration services in real-world scenarios have been discussed and developed.To further validate the performance of FL-HMChain,a Convolutional Neural Network-based Federated Learning(FL-CNN-HMChain)model is investigated for medical image identification.This model achieves better performance compared to the baseline Convolutional Neural Network(CNN),having an average improvement of 4.7%on Area Under Curve(AUC)and 7%on Accuracy(ACC),respectively.Furthermore,the probability of privacy leakage can be effectively reduced by the blockchain-based parameter transfer mechanism in federated learning between local and global models.展开更多
Nowadays,smart wearable devices are used widely in the Social Internet of Things(IoT),which record human physiological data in real time.To protect the data privacy of smart devices,researchers pay more attention to f...Nowadays,smart wearable devices are used widely in the Social Internet of Things(IoT),which record human physiological data in real time.To protect the data privacy of smart devices,researchers pay more attention to federated learning.Although the data leakage problem is somewhat solved,a new challenge has emerged.Asynchronous federated learning shortens the convergence time,while it has time delay and data heterogeneity problems.Both of the two problems harm the accuracy.To overcome these issues,we propose an asynchronous federated learning scheme based on double compensation to solve the problem of time delay and data heterogeneity problems.The scheme improves the Delay Compensated Asynchronous Stochastic Gradient Descent(DC-ASGD)algorithm based on the second-order Taylor expansion as the delay compensation.It adds the FedProx operator to the objective function as the heterogeneity compensation.Besides,the proposed scheme motivates the federated learning process by adjusting the importance of the participants and the central server.We conduct multiple sets of experiments in both conventional and heterogeneous scenarios.The experimental results show that our scheme improves the accuracy by about 5%while keeping the complexity constant.We can find that our scheme converges more smoothly during training and adapts better in heterogeneous environments through numerical experiments.The proposed double-compensation-based federated learning scheme is highly accurate,flexible in terms of participants and smooth the training process.Hence it is deemed suitable for data privacy protection of smart wearable devices.展开更多
With the advent of the era of big data,the exponential growth of data generation has provided unprecedented opportunities for innovation and insight in various fields.However,increasing privacy and security concerns a...With the advent of the era of big data,the exponential growth of data generation has provided unprecedented opportunities for innovation and insight in various fields.However,increasing privacy and security concerns and the existence of the phenomenon of“data silos”limit the collaborative utilization of data.This paper systematically discusses the technological progress of federated learning,including its basic framework,model optimization,communication efficiency improvement,privacy protection mechanism,and integration with other technologies.It then analyzes the broad applications of federated learning in healthcare,the Internet of Things,Internet of Vehicles,smart cities,and financial services,and summarizes its challenges in data heterogeneity,communication overhead,privacy protection,scalability,and security.Finally,this paper looks forward to the future development direction of federated learning and proposes potential research paths in efficient algorithm design,privacy protection mechanism optimization,heterogeneous data processing,and cross-industry collaboration.展开更多
Image classification is crucial for various applications,including digital construction,smart manu-facturing,and medical imaging.Focusing on the inadequate model generalization and data privacy concerns in few-shot im...Image classification is crucial for various applications,including digital construction,smart manu-facturing,and medical imaging.Focusing on the inadequate model generalization and data privacy concerns in few-shot image classification,in this paper,we propose a federated learning approach that incorporates privacy-preserving techniques.First,we utilize contrastive learning to train on local few-shot image data and apply various data augmentation methods to expand the sample size,thereby enhancing the model’s generalization capabilities in few-shot contexts.Second,we introduce local differential privacy techniques and weight pruning methods to safeguard model parameters,perturbing the transmitted parameters to ensure user data privacy.Finally,numerical simulations are conducted to demonstrate the effectiveness of our proposed method.The results indicate that our approach significantly enhances model generalization and test accuracy compared to several popular federated learning algorithms while maintaining data privacy,highlighting its effectiveness and practicality in addressing the challenges of model generalization and data privacy in few-shot image scenarios.展开更多
The proliferation of deep learning(DL)has amplified the demand for processing large and complex datasets for tasks such as modeling,classification,and identification.However,traditional DL methods compromise client pr...The proliferation of deep learning(DL)has amplified the demand for processing large and complex datasets for tasks such as modeling,classification,and identification.However,traditional DL methods compromise client privacy by collecting sensitive data,underscoring the necessity for privacy-preserving solutions like Federated Learning(FL).FL effectively addresses escalating privacy concerns by facilitating collaborative model training without necessitating the sharing of raw data.Given that FL clients autonomously manage training data,encouraging client engagement is pivotal for successful model training.To overcome challenges like unreliable communication and budget constraints,we present ENTIRE,a contract-based dynamic participation incentive mechanism for FL.ENTIRE ensures impartial model training by tailoring participation levels and payments to accommodate diverse client preferences.Our approach involves several key steps.Initially,we examine how random client participation impacts FL convergence in non-convex scenarios,establishing the correlation between client participation levels and model performance.Subsequently,we reframe model performance optimization as an optimal contract design challenge to guide the distribution of rewards among clients with varying participation costs.By balancing budget considerations with model effectiveness,we craft optimal contracts for different budgetary constraints,prompting clients to disclose their participation preferences and select suitable contracts for contributing to model training.Finally,we conduct a comprehensive experimental evaluation of ENTIRE using three real datasets.The results demonstrate a significant 12.9%enhancement in model performance,validating its adherence to anticipated economic properties.展开更多
Environmental transition can potentially influence cardiovascular health.Investigating the relationship between such transition and heart disease has important applications.This study uses federated learning(FL)in thi...Environmental transition can potentially influence cardiovascular health.Investigating the relationship between such transition and heart disease has important applications.This study uses federated learning(FL)in this context and investigates the link between climate change and heart disease.The dataset containing environmental,meteorological,and health-related factors like blood sugar,cholesterol,maximum heart rate,fasting ECG,etc.,is used with machine learning models to identify hidden patterns and relationships.Algorithms such as federated learning,XGBoost,random forest,support vector classifier,extra tree classifier,k-nearest neighbor,and logistic regression are used.A framework for diagnosing heart disease is designed using FL along with other models.Experiments involve discriminating healthy subjects from those who are heart patients and obtain an accuracy of 94.03%.The proposed FL-based framework proves to be superior to existing techniques in terms of usability,dependability,and accuracy.This study paves the way for screening people for early heart disease detection and continuous monitoring in telemedicine and remote care.Personalized treatment can also be planned with customized therapies.展开更多
With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Neve...With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.展开更多
As the information sensing and processing capabilities of IoT devices increase,a large amount of data is being generated at the edge of Industrial IoT(IIoT),which has become a strong foundation for distributed Artific...As the information sensing and processing capabilities of IoT devices increase,a large amount of data is being generated at the edge of Industrial IoT(IIoT),which has become a strong foundation for distributed Artificial Intelligence(AI)applications.However,most users are reluctant to disclose their data due to network bandwidth limitations,device energy consumption,and privacy requirements.To address this issue,this paper introduces an Edge-assisted Federated Learning(EFL)framework,along with an incentive mechanism for lightweight industrial data sharing.In order to reduce the information asymmetry between data owners and users,an EFL model-sharing incentive mechanism based on contract theory is designed.In addition,a weight dispersion evaluation scheme based on Wasserstein distance is proposed.This study models an optimization problem of node selection and sharing incentives to maximize the EFL model consumers'profit and ensure the quality of training services.An incentive-based EFL algorithm with individual rationality and incentive compatibility constraints is proposed.Finally,the experimental results verify the effectiveness of the proposed scheme in terms of positive incentives for contract design and performance analysis of EFL systems.展开更多
With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.Howe...With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.However,while enjoying the convenience brought by this technology,it is crucial to effectively protect the privacy of users’video data.Therefore,this paper proposes a video action recognition method based on personalized federated learning and spatiotemporal features.Under the framework of federated learning,a video action recognition method leveraging spatiotemporal features is designed.For the local spatiotemporal features of the video,a new differential information extraction scheme is proposed to extract differential features with a single RGB frame as the center,and a spatialtemporal module based on local information is designed to improve the effectiveness of local feature extraction;for the global temporal features,a method of extracting action rhythm features using differential technology is proposed,and a timemodule based on global information is designed.Different translational strides are used in the module to obtain bidirectional differential features under different action rhythms.Additionally,to address user data privacy issues,the method divides model parameters into local private parameters and public parameters based on the structure of the video action recognition model.This approach enhancesmodel training performance and ensures the security of video data.The experimental results show that under personalized federated learning conditions,an average accuracy of 97.792%was achieved on the UCF-101 dataset,which is non-independent and identically distributed(non-IID).This research provides technical support for privacy protection in video action recognition.展开更多
Federated Learning(FL)has emerged as a promising distributed machine learning paradigm that enables multi-party collaborative training while eliminating the need for raw data sharing.However,its reliance on a server i...Federated Learning(FL)has emerged as a promising distributed machine learning paradigm that enables multi-party collaborative training while eliminating the need for raw data sharing.However,its reliance on a server introduces critical security vulnerabilities:malicious servers can infer private information from received local model updates or deliberately manipulate aggregation results.Consequently,achieving verifiable aggregation without compromising client privacy remains a critical challenge.To address these problem,we propose a reversible data hiding in encrypted domains(RDHED)scheme,which designs joint secret message embedding and extraction mechanism.This approach enables clients to embed secret messages into ciphertext redundancy spaces generated during model encryption.During the server aggregation process,the embedded messages from all clients fuse within the ciphertext space to form a joint embedding message.Subsequently,clients can decrypt the aggregated results and extract this joint embedding message for verification purposes.Building upon this foundation,we integrate the proposed RDHED scheme with linear homomorphic hash and digital signatures to design a verifiable privacy-preserving aggregation protocol for single-server architectures(VPAFL).Theoretical proofs and experimental analyses show that VPAFL can effectively protect user privacy,achieve lightweight computational and communication overhead of users for verification,and present significant advantages with increasing model dimension.展开更多
Sharing data while protecting privacy in the industrial Internet is a significant challenge.Traditional machine learning methods require a combination of all data for training;however,this approach can be limited by d...Sharing data while protecting privacy in the industrial Internet is a significant challenge.Traditional machine learning methods require a combination of all data for training;however,this approach can be limited by data availability and privacy concerns.Federated learning(FL)has gained considerable attention because it allows for decentralized training on multiple local datasets.However,the training data collected by data providers are often non-independent and identically distributed(non-IID),resulting in poor FL performance.This paper proposes a privacy-preserving approach for sharing non-IID data in the industrial Internet using an FL approach based on blockchain technology.To overcome the problem of non-IID data leading to poor training accuracy,we propose dynamically updating the local model based on the divergence of the global and local models.This approach can significantly improve the accuracy of FL training when there is relatively large dispersion.In addition,we design a dynamic gradient clipping algorithm to alleviate the influence of noise on the model accuracy to reduce potential privacy leakage caused by sharing model parameters.Finally,we evaluate the performance of the proposed scheme using commonly used open-source image datasets.The simulation results demonstrate that our method can significantly enhance the accuracy while protecting privacy and maintaining efficiency,thereby providing a new solution to data-sharing and privacy-protection challenges in the industrial Internet.展开更多
In the financial sector, data are highly confidential and sensitive,and ensuring data privacy is critical. Sample fusion is the basis of horizontalfederation learning, but it is suitable only for scenarios where custo...In the financial sector, data are highly confidential and sensitive,and ensuring data privacy is critical. Sample fusion is the basis of horizontalfederation learning, but it is suitable only for scenarios where customershave the same format but different targets, namely for scenarios with strongfeature overlapping and weak user overlapping. To solve this limitation, thispaper proposes a federated learning-based model with local data sharing anddifferential privacy. The indexing mechanism of differential privacy is used toobtain different degrees of privacy budgets, which are applied to the gradientaccording to the contribution degree to ensure privacy without affectingaccuracy. In addition, data sharing is performed to improve the utility ofthe global model. Further, the distributed prediction model is used to predictcustomers’ loan propensity on the premise of protecting user privacy. Usingan aggregation mechanism based on federated learning can help to train themodel on distributed data without exposing local data. The proposed methodis verified by experiments, and experimental results show that for non-iiddata, the proposed method can effectively improve data accuracy and reducethe impact of sample tilt. The proposed method can be extended to edgecomputing, blockchain, and the Industrial Internet of Things (IIoT) fields.The theoretical analysis and experimental results show that the proposedmethod can ensure the privacy and accuracy of the federated learning processand can also improve the model utility for non-iid data by 7% compared tothe federated averaging method (FedAvg).展开更多
As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is o...As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is one of these challenges that can significantly degrade the learning efficiency.To deal with data imbalance issue,this work proposes a new learning framework,called clustered federated learning with weighted model aggregation(weighted CFL).Compared with traditional FL,our weighted CFL adaptively clusters the participating edge devices based on the cosine similarity of their local gradients at each training iteration,and then performs weighted per-cluster model aggregation.Therein,the similarity threshold for clustering is adaptive over iterations in response to the time-varying divergence of local gradients.Moreover,the weights for per-cluster model aggregation are adjusted according to the data balance feature so as to speed up the convergence rate.Experimental results show that the proposed weighted CFL achieves a faster model convergence rate and greater learning accuracy than benchmark methods under the imbalanced data scenario.展开更多
Due to the extensive use of various intelligent terminals and the popularity of network social tools,a large amount of data in the field of medical emerged.How to manage these massive data safely and reliably has beco...Due to the extensive use of various intelligent terminals and the popularity of network social tools,a large amount of data in the field of medical emerged.How to manage these massive data safely and reliably has become an important challenge for the medical network community.This paper proposes a data management framework of medical network community based on Consortium Blockchain(CB)and Federated learning(FL),which realizes the data security sharing between medical institutions and research institutions.Under this framework,the data security sharing mechanism of medical network community based on smart contract and the data privacy protection mechanism based on FL and alliance chain are designed to ensure the security of data and the privacy of important data in medical network community,respectively.An intelligent contract system based on Keyed-Homomorphic Public Key(KH-PKE)Encryption scheme is designed,so that medical data can be saved in the CB in the form of ciphertext,and the automatic sharing of data is realized.Zero knowledge mechanism is used to ensure the correctness of shared data.Moreover,the zero-knowledge mechanism introduces the dynamic group signature mechanism of chosen ciphertext attack(CCA)anonymity,which makes the scheme more efficient in computing and communication cost.In the end of this paper,the performance of the scheme is analyzed fromboth asymptotic and practical aspects.Through experimental comparative analysis,the scheme proposed in this paper is more effective and feasible.展开更多
基金supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(No.2022D01B187).
文摘Heterogeneous federated learning(HtFL)has gained significant attention due to its ability to accommodate diverse models and data from distributed combat units.The prototype-based HtFL methods were proposed to reduce the high communication cost of transmitting model parameters.These methods allow for the sharing of only class representatives between heterogeneous clients while maintaining privacy.However,existing prototype learning approaches fail to take the data distribution of clients into consideration,which results in suboptimal global prototype learning and insufficient client model personalization capabilities.To address these issues,we propose a fair trainable prototype federated learning(FedFTP)algorithm,which employs a fair sampling training prototype(FSTP)mechanism and a hyperbolic space constraints(HSC)mechanism to enhance the fairness and effectiveness of prototype learning on the server in heterogeneous environments.Furthermore,a local prototype stable update(LPSU)mechanism is proposed as a means of maintaining personalization while promoting global consistency,based on contrastive learning.Comprehensive experimental results demonstrate that FedFTP achieves state-of-the-art performance in HtFL scenarios.
基金supported and funded by the National Natural Science Foundation of China(82101079)the Key R&D Program of Jiangsu Province(BE2023836)the National Key Research and Development Program of China(SQ2023YFC2400025).
文摘As the integration of medical big data and artificial intelligence advances,the secure sharing of medical data has become a key driving force for advancing disease research and clinical diagnosis.Federated learning,a distributed approach enabling collaborative data processing without sharing raw data,offers promising solutions to challenges in multi-center medical data sharing.This review summarizes the progress of federated learning in multi-center medical data processing,analyzed from four perspectives:system architectures,data distribution strategies,clinical tasks,and algorithmic models.At the same time,this paper explores the challenges in practical applications,such as data heterogeneity,communication overhead,and privacy concerns.It proposes driving future research development by optimizing algorithms,strengthening privacy protection mechanisms,and enhancing computational efficiency.
文摘Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving data privacy and avoiding direct data sharing.Despite its privacy-preserving advantages,FL remains vulnerable to backdoor attacks,where malicious participants introduce backdoors into local models that are then propagated to the global model through the aggregation process.While existing differential privacy defenses have demonstrated effectiveness against backdoor attacks in FL,they often incur a significant degradation in the performance of the aggregated models on benign tasks.To address this limitation,we propose a novel backdoor defense mechanism based on differential privacy.Our approach first utilizes the inherent out-of-distribution characteristics of backdoor samples to identify and exclude malicious model updates that significantly deviate from benign models.By filtering out models that are clearly backdoor-infected before applying differential privacy,our method reduces the required noise level for differential privacy,thereby enhancing model robustness while preserving performance.Experimental evaluations on the CIFAR10 and FEMNIST datasets demonstrate that our method effectively limits the backdoor accuracy to below 15%across various backdoor scenarios while maintaining high main task accuracy.
基金supported by the Ministry of Education Industry-University Cooperation Collaborative Education Projects of China under Grant 202102119036 and 202102082013。
文摘Data sharing technology in Internet of Vehicles(Io V)has attracted great research interest with the goal of realizing intelligent transportation and traffic management.Meanwhile,the main concerns have been raised about the security and privacy of vehicle data.The mobility and real-time characteristics of vehicle data make data sharing more difficult in Io V.The emergence of blockchain and federated learning brings new directions.In this paper,a data-sharing model that combines blockchain and federated learning is proposed to solve the security and privacy problems of data sharing in Io V.First,we use federated learning to share data instead of exposing actual data and propose an adaptive differential privacy scheme to further balance the privacy and availability of data.Then,we integrate the verification scheme into the consensus process,so that the consensus computation can filter out low-quality models.Experimental data shows that our data-sharing model can better balance the relationship between data availability and privacy,and also has enhanced security.
基金financially supported by the National Natural Science Foundation of China(Nos.51801019 and U1808208)。
文摘The martensite start temperature is a critical parameter for steels with metastable austenite.Although numerous models have been developed to predict the martensite start(Ms)temperature,the complexity of the martensitic transformation greatly limits their performance and extensibility.In this work,we apply deep data mining of thermodynamic calculations and deep learning to develop a generic model for Msprediction.Deep data mining was used to establish a hierarchical database with three levels of information.Then,a convolutional neural network model,which can accurately treat the hierarchical data structure,was used to obtain the final model.By integrating thermodynamic calculations,traditional machine learning and deep learning modeling,the final predictor model shows excellent generalizability and extensibility,i.e.model performance both within and beyond the composition range of the original database.The effects of 15 alloying elements were considered successfully using the proposed methodology.The work suggests that,with the help of deep data mining considering the physical mechanisms,deep learning methods can partially mitigate the challenge with limited data in materials science and provide a means for solving complex problems with small databases.
基金This research is sponsored by the National Key R&D Program of China(No.2018YFB2100400)the National Natural Science Foundation of China(No.62002077,61872100)+3 种基金the Guangdong Basic and Applied Basic Research Foundation(No.2020A1515110385)Strategic Research and Consultation Project of the Chinese Academy of Engineering(No.2021-HYZD-8-3)the China Postdoctoral Science Foundation(No.2020M682657)Zhejiang Lab(No.2020NF0AB01).
文摘The development of data-driven artificial intelligence technology has given birth to a variety of big data applications.Data has become an essential factor to improve these applications.Federated learning,a privacy-preserving machine learning method,is proposed to leverage data from different data owners.It is typically used in conjunction with cryptographic methods,in which data owners train the global model by sharing encrypted model updates.However,data encryption makes it difficult to identify the quality of these model updates.Malicious data owners may launch attacks such as data poisoning and free-riding.To defend against such attacks,it is necessary to find an approach to audit encrypted model updates.In this paper,we propose a blockchain-based audit approach for encrypted gradients.It uses a behavior chain to record the encrypted gradients from data owners,and an audit chain to evaluate the gradients’quality.Specifically,we propose a privacy-preserving homomorphic noise mechanism in which the noise of each gradient sums to zero after aggregation,ensuring the availability of aggregated gradient.In addition,we design a joint audit algorithm that can locate malicious data owners without decrypting individual gradients.Through security analysis and experimental evaluation,we demonstrate that our approach can defend against malicious gradient attacks in federated learning.
基金We are thankful for the funding support fromthe Science and Technology Projects of the National Archives Administration of China(Grant Number 2022-R-031)the Fundamental Research Funds for the Central Universities,Central China Normal University(Grant Number CCNU24CG014).
文摘As the volume of healthcare and medical data increases from diverse sources,real-world scenarios involving data sharing and collaboration have certain challenges,including the risk of privacy leakage,difficulty in data fusion,low reliability of data storage,low effectiveness of data sharing,etc.To guarantee the service quality of data collaboration,this paper presents a privacy-preserving Healthcare and Medical Data Collaboration Service System combining Blockchain with Federated Learning,termed FL-HMChain.This system is composed of three layers:Data extraction and storage,data management,and data application.Focusing on healthcare and medical data,a healthcare and medical blockchain is constructed to realize data storage,transfer,processing,and access with security,real-time,reliability,and integrity.An improved master node selection consensus mechanism is presented to detect and prevent dishonest behavior,ensuring the overall reliability and trustworthiness of the collaborative model training process.Furthermore,healthcare and medical data collaboration services in real-world scenarios have been discussed and developed.To further validate the performance of FL-HMChain,a Convolutional Neural Network-based Federated Learning(FL-CNN-HMChain)model is investigated for medical image identification.This model achieves better performance compared to the baseline Convolutional Neural Network(CNN),having an average improvement of 4.7%on Area Under Curve(AUC)and 7%on Accuracy(ACC),respectively.Furthermore,the probability of privacy leakage can be effectively reduced by the blockchain-based parameter transfer mechanism in federated learning between local and global models.
基金supported by the National Natural Science Foundation of China,No.61977006.
文摘Nowadays,smart wearable devices are used widely in the Social Internet of Things(IoT),which record human physiological data in real time.To protect the data privacy of smart devices,researchers pay more attention to federated learning.Although the data leakage problem is somewhat solved,a new challenge has emerged.Asynchronous federated learning shortens the convergence time,while it has time delay and data heterogeneity problems.Both of the two problems harm the accuracy.To overcome these issues,we propose an asynchronous federated learning scheme based on double compensation to solve the problem of time delay and data heterogeneity problems.The scheme improves the Delay Compensated Asynchronous Stochastic Gradient Descent(DC-ASGD)algorithm based on the second-order Taylor expansion as the delay compensation.It adds the FedProx operator to the objective function as the heterogeneity compensation.Besides,the proposed scheme motivates the federated learning process by adjusting the importance of the participants and the central server.We conduct multiple sets of experiments in both conventional and heterogeneous scenarios.The experimental results show that our scheme improves the accuracy by about 5%while keeping the complexity constant.We can find that our scheme converges more smoothly during training and adapts better in heterogeneous environments through numerical experiments.The proposed double-compensation-based federated learning scheme is highly accurate,flexible in terms of participants and smooth the training process.Hence it is deemed suitable for data privacy protection of smart wearable devices.
文摘With the advent of the era of big data,the exponential growth of data generation has provided unprecedented opportunities for innovation and insight in various fields.However,increasing privacy and security concerns and the existence of the phenomenon of“data silos”limit the collaborative utilization of data.This paper systematically discusses the technological progress of federated learning,including its basic framework,model optimization,communication efficiency improvement,privacy protection mechanism,and integration with other technologies.It then analyzes the broad applications of federated learning in healthcare,the Internet of Things,Internet of Vehicles,smart cities,and financial services,and summarizes its challenges in data heterogeneity,communication overhead,privacy protection,scalability,and security.Finally,this paper looks forward to the future development direction of federated learning and proposes potential research paths in efficient algorithm design,privacy protection mechanism optimization,heterogeneous data processing,and cross-industry collaboration.
基金supported by Suzhou Science and Technology Plan(Basic Research)Project under Grant SJC2023002Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant KYCX23_3322.
文摘Image classification is crucial for various applications,including digital construction,smart manu-facturing,and medical imaging.Focusing on the inadequate model generalization and data privacy concerns in few-shot image classification,in this paper,we propose a federated learning approach that incorporates privacy-preserving techniques.First,we utilize contrastive learning to train on local few-shot image data and apply various data augmentation methods to expand the sample size,thereby enhancing the model’s generalization capabilities in few-shot contexts.Second,we introduce local differential privacy techniques and weight pruning methods to safeguard model parameters,perturbing the transmitted parameters to ensure user data privacy.Finally,numerical simulations are conducted to demonstrate the effectiveness of our proposed method.The results indicate that our approach significantly enhances model generalization and test accuracy compared to several popular federated learning algorithms while maintaining data privacy,highlighting its effectiveness and practicality in addressing the challenges of model generalization and data privacy in few-shot image scenarios.
基金supported by the National Natural Science Foundation of China(Nos.62072411,62372343,62402352,62403500)the Key Research and Development Program of Hubei Province(No.2023BEB024)the Open Fund of Key Laboratory of Social Computing and Cognitive Intelligence(Dalian University of Technology),Ministry of Education(No.SCCI2024TB02).
文摘The proliferation of deep learning(DL)has amplified the demand for processing large and complex datasets for tasks such as modeling,classification,and identification.However,traditional DL methods compromise client privacy by collecting sensitive data,underscoring the necessity for privacy-preserving solutions like Federated Learning(FL).FL effectively addresses escalating privacy concerns by facilitating collaborative model training without necessitating the sharing of raw data.Given that FL clients autonomously manage training data,encouraging client engagement is pivotal for successful model training.To overcome challenges like unreliable communication and budget constraints,we present ENTIRE,a contract-based dynamic participation incentive mechanism for FL.ENTIRE ensures impartial model training by tailoring participation levels and payments to accommodate diverse client preferences.Our approach involves several key steps.Initially,we examine how random client participation impacts FL convergence in non-convex scenarios,establishing the correlation between client participation levels and model performance.Subsequently,we reframe model performance optimization as an optimal contract design challenge to guide the distribution of rewards among clients with varying participation costs.By balancing budget considerations with model effectiveness,we craft optimal contracts for different budgetary constraints,prompting clients to disclose their participation preferences and select suitable contracts for contributing to model training.Finally,we conduct a comprehensive experimental evaluation of ENTIRE using three real datasets.The results demonstrate a significant 12.9%enhancement in model performance,validating its adherence to anticipated economic properties.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Environmental transition can potentially influence cardiovascular health.Investigating the relationship between such transition and heart disease has important applications.This study uses federated learning(FL)in this context and investigates the link between climate change and heart disease.The dataset containing environmental,meteorological,and health-related factors like blood sugar,cholesterol,maximum heart rate,fasting ECG,etc.,is used with machine learning models to identify hidden patterns and relationships.Algorithms such as federated learning,XGBoost,random forest,support vector classifier,extra tree classifier,k-nearest neighbor,and logistic regression are used.A framework for diagnosing heart disease is designed using FL along with other models.Experiments involve discriminating healthy subjects from those who are heart patients and obtain an accuracy of 94.03%.The proposed FL-based framework proves to be superior to existing techniques in terms of usability,dependability,and accuracy.This study paves the way for screening people for early heart disease detection and continuous monitoring in telemedicine and remote care.Personalized treatment can also be planned with customized therapies.
文摘With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.
基金supported by the National Natural Science Foundation of China (No.62071070)Major science and technology special project of Science and Technology Department of Yunnan Province (202002AB080001-8)BUPT innovation&entrepreneurship support program (2023-YC-T031)。
文摘As the information sensing and processing capabilities of IoT devices increase,a large amount of data is being generated at the edge of Industrial IoT(IIoT),which has become a strong foundation for distributed Artificial Intelligence(AI)applications.However,most users are reluctant to disclose their data due to network bandwidth limitations,device energy consumption,and privacy requirements.To address this issue,this paper introduces an Edge-assisted Federated Learning(EFL)framework,along with an incentive mechanism for lightweight industrial data sharing.In order to reduce the information asymmetry between data owners and users,an EFL model-sharing incentive mechanism based on contract theory is designed.In addition,a weight dispersion evaluation scheme based on Wasserstein distance is proposed.This study models an optimization problem of node selection and sharing incentives to maximize the EFL model consumers'profit and ensure the quality of training services.An incentive-based EFL algorithm with individual rationality and incentive compatibility constraints is proposed.Finally,the experimental results verify the effectiveness of the proposed scheme in terms of positive incentives for contract design and performance analysis of EFL systems.
基金supported by National Natural Science Foundation of China(Grant No.62071098)Sichuan Science and Technology Program(Grants 2022YFG0319,2023YFG0301 and 2023YFG0018).
文摘With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.However,while enjoying the convenience brought by this technology,it is crucial to effectively protect the privacy of users’video data.Therefore,this paper proposes a video action recognition method based on personalized federated learning and spatiotemporal features.Under the framework of federated learning,a video action recognition method leveraging spatiotemporal features is designed.For the local spatiotemporal features of the video,a new differential information extraction scheme is proposed to extract differential features with a single RGB frame as the center,and a spatialtemporal module based on local information is designed to improve the effectiveness of local feature extraction;for the global temporal features,a method of extracting action rhythm features using differential technology is proposed,and a timemodule based on global information is designed.Different translational strides are used in the module to obtain bidirectional differential features under different action rhythms.Additionally,to address user data privacy issues,the method divides model parameters into local private parameters and public parameters based on the structure of the video action recognition model.This approach enhancesmodel training performance and ensures the security of video data.The experimental results show that under personalized federated learning conditions,an average accuracy of 97.792%was achieved on the UCF-101 dataset,which is non-independent and identically distributed(non-IID).This research provides technical support for privacy protection in video action recognition.
基金supported in part by the National Natural Science Foundation of China under Grants 62102450,62272478the Independent Research Project of a Certain Unit under Grant ZZKY20243127.
文摘Federated Learning(FL)has emerged as a promising distributed machine learning paradigm that enables multi-party collaborative training while eliminating the need for raw data sharing.However,its reliance on a server introduces critical security vulnerabilities:malicious servers can infer private information from received local model updates or deliberately manipulate aggregation results.Consequently,achieving verifiable aggregation without compromising client privacy remains a critical challenge.To address these problem,we propose a reversible data hiding in encrypted domains(RDHED)scheme,which designs joint secret message embedding and extraction mechanism.This approach enables clients to embed secret messages into ciphertext redundancy spaces generated during model encryption.During the server aggregation process,the embedded messages from all clients fuse within the ciphertext space to form a joint embedding message.Subsequently,clients can decrypt the aggregated results and extract this joint embedding message for verification purposes.Building upon this foundation,we integrate the proposed RDHED scheme with linear homomorphic hash and digital signatures to design a verifiable privacy-preserving aggregation protocol for single-server architectures(VPAFL).Theoretical proofs and experimental analyses show that VPAFL can effectively protect user privacy,achieve lightweight computational and communication overhead of users for verification,and present significant advantages with increasing model dimension.
基金This work was supported by the National Key R&D Program of China under Grant 2023YFB2703802the Hunan Province Innovation and Entrepreneurship Training Program for College Students S202311528073.
文摘Sharing data while protecting privacy in the industrial Internet is a significant challenge.Traditional machine learning methods require a combination of all data for training;however,this approach can be limited by data availability and privacy concerns.Federated learning(FL)has gained considerable attention because it allows for decentralized training on multiple local datasets.However,the training data collected by data providers are often non-independent and identically distributed(non-IID),resulting in poor FL performance.This paper proposes a privacy-preserving approach for sharing non-IID data in the industrial Internet using an FL approach based on blockchain technology.To overcome the problem of non-IID data leading to poor training accuracy,we propose dynamically updating the local model based on the divergence of the global and local models.This approach can significantly improve the accuracy of FL training when there is relatively large dispersion.In addition,we design a dynamic gradient clipping algorithm to alleviate the influence of noise on the model accuracy to reduce potential privacy leakage caused by sharing model parameters.Finally,we evaluate the performance of the proposed scheme using commonly used open-source image datasets.The simulation results demonstrate that our method can significantly enhance the accuracy while protecting privacy and maintaining efficiency,thereby providing a new solution to data-sharing and privacy-protection challenges in the industrial Internet.
基金supported by the National Natural Science Foundation (NSFC),China,under the National Natural Science Foundation Youth Fund program (J.Hao,No.62101275).
文摘In the financial sector, data are highly confidential and sensitive,and ensuring data privacy is critical. Sample fusion is the basis of horizontalfederation learning, but it is suitable only for scenarios where customershave the same format but different targets, namely for scenarios with strongfeature overlapping and weak user overlapping. To solve this limitation, thispaper proposes a federated learning-based model with local data sharing anddifferential privacy. The indexing mechanism of differential privacy is used toobtain different degrees of privacy budgets, which are applied to the gradientaccording to the contribution degree to ensure privacy without affectingaccuracy. In addition, data sharing is performed to improve the utility ofthe global model. Further, the distributed prediction model is used to predictcustomers’ loan propensity on the premise of protecting user privacy. Usingan aggregation mechanism based on federated learning can help to train themodel on distributed data without exposing local data. The proposed methodis verified by experiments, and experimental results show that for non-iiddata, the proposed method can effectively improve data accuracy and reducethe impact of sample tilt. The proposed method can be extended to edgecomputing, blockchain, and the Industrial Internet of Things (IIoT) fields.The theoretical analysis and experimental results show that the proposedmethod can ensure the privacy and accuracy of the federated learning processand can also improve the model utility for non-iid data by 7% compared tothe federated averaging method (FedAvg).
文摘As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is one of these challenges that can significantly degrade the learning efficiency.To deal with data imbalance issue,this work proposes a new learning framework,called clustered federated learning with weighted model aggregation(weighted CFL).Compared with traditional FL,our weighted CFL adaptively clusters the participating edge devices based on the cosine similarity of their local gradients at each training iteration,and then performs weighted per-cluster model aggregation.Therein,the similarity threshold for clustering is adaptive over iterations in response to the time-varying divergence of local gradients.Moreover,the weights for per-cluster model aggregation are adjusted according to the data balance feature so as to speed up the convergence rate.Experimental results show that the proposed weighted CFL achieves a faster model convergence rate and greater learning accuracy than benchmark methods under the imbalanced data scenario.
基金supported by the NSFC(No.62072249)Yongjun Ren received the grant and the URLs to sponsors’websites is https://www.nsfc.gov.cn/.
文摘Due to the extensive use of various intelligent terminals and the popularity of network social tools,a large amount of data in the field of medical emerged.How to manage these massive data safely and reliably has become an important challenge for the medical network community.This paper proposes a data management framework of medical network community based on Consortium Blockchain(CB)and Federated learning(FL),which realizes the data security sharing between medical institutions and research institutions.Under this framework,the data security sharing mechanism of medical network community based on smart contract and the data privacy protection mechanism based on FL and alliance chain are designed to ensure the security of data and the privacy of important data in medical network community,respectively.An intelligent contract system based on Keyed-Homomorphic Public Key(KH-PKE)Encryption scheme is designed,so that medical data can be saved in the CB in the form of ciphertext,and the automatic sharing of data is realized.Zero knowledge mechanism is used to ensure the correctness of shared data.Moreover,the zero-knowledge mechanism introduces the dynamic group signature mechanism of chosen ciphertext attack(CCA)anonymity,which makes the scheme more efficient in computing and communication cost.In the end of this paper,the performance of the scheme is analyzed fromboth asymptotic and practical aspects.Through experimental comparative analysis,the scheme proposed in this paper is more effective and feasible.