The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems.However,the communication overhead is a major bottleneck that hampers ...The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems.However,the communication overhead is a major bottleneck that hampers the scalability of distributed machine learning systems.In this paper,we design two communication-efficient algorithms for distributed learning tasks.The first one is named EF-SIGNGD,in which we use the 1-bit(sign-based) gradient quantization method to save the communication bits.Moreover,the error feedback technique,i.e.,incorporating the error made by the compression operator into the next step,is employed for the convergence guarantee.The second algorithm is called LE-SIGNGD,in which we introduce a well-designed lazy gradient aggregation rule to EF-SIGNGD that can detect the gradients with small changes and reuse the outdated information.LE-SIGNGD saves communication costs both in transmitted bits and communication rounds.Furthermore,we show that LE-SIGNGD is convergent under some mild assumptions.The effectiveness of the two proposed algorithms is demonstrated through experiments on both real and synthetic data.展开更多
Zero-inflated count outcomes are common in many studies,such as counting claim frequency in the insurance industry in which identifying and understanding excessive zeros are of interest.Moreover,with the progress of d...Zero-inflated count outcomes are common in many studies,such as counting claim frequency in the insurance industry in which identifying and understanding excessive zeros are of interest.Moreover,with the progress of data collecting and storage techniques,the amount of data is too massive to be stored or processed by a single node or branch.Hence,to develop distributed data analysis is blossoming.In this paper,several communication-efficient distributed zero-inflated Poisson regression algorithms are developed to analyse such kind of large-scale zero-inflated data.Both asymptotic properties of the proposed estimators and algorithm complexities are well studied and conducted.Various simulation studies demonstrate that our proposed method and algorithm work well and efficiently.Finally,in the case study,we apply our proposed algorithms to car insurance data from Kaggle.展开更多
The forthcoming sixth generation(6G)of mobile communication networks is envisioned to be AInative,supporting intelligent services and pervasive computing at unprecedented scale.Among the key paradigms enabling this vi...The forthcoming sixth generation(6G)of mobile communication networks is envisioned to be AInative,supporting intelligent services and pervasive computing at unprecedented scale.Among the key paradigms enabling this vision,Federated Learning(FL)has gained prominence as a distributed machine learning framework that allows multiple devices to collaboratively train models without sharing raw data,thereby preserving privacy and reducing the need for centralized storage.This capability is particularly attractive for vision-based applications,where image and video data are both sensitive and bandwidth-intensive.However,the integration of FL with 6G networks presents unique challenges,including communication bottlenecks,device heterogeneity,and trade-offs between model accuracy,latency,and energy consumption.In this paper,we developed a simulation-based framework to investigate the performance of FL in representative vision tasks under 6G-like environments.We formalize the system model,incorporating both the federated averaging(FedAvg)training process and a simplified communication costmodel that captures bandwidth constraints,packet loss,and variable latency across edge devices.Using standard image datasets(e.g.,MNIST,CIFAR-10)as benchmarks,we analyze how factors such as the number of participating clients,degree of data heterogeneity,and communication frequency influence convergence speed and model accuracy.Additionally,we evaluate the effectiveness of lightweight communication-efficient strategies,including local update tuning and gradient compression,in mitigating network overhead.The experimental results reveal several key insights:(i)communication limitations can significantly degrade FL convergence in vision tasks if not properly addressed;(ii)judicious tuning of local training epochs and client participation levels enables notable improvements in both efficiency and accuracy;and(iii)communication-efficient FL strategies provide a promising pathway to balance performance with the stringent latency and reliability requirements expected in 6G.These findings highlight the synergistic role of AI and nextgeneration networks in enabling privacy-preserving,real-time vision applications,and they provide concrete design guidelines for researchers and practitioners working at the intersection of FL and 6G.展开更多
The rapid emergence of massive datasets in various fields poses a serious challenge to tra-ditional statistical methods.Meanwhile,it provides opportunities for researchers to develop novel algorithms.Inspired by the i...The rapid emergence of massive datasets in various fields poses a serious challenge to tra-ditional statistical methods.Meanwhile,it provides opportunities for researchers to develop novel algorithms.Inspired by the idea of divide-and-conquer,various distributed frameworks for statistical estimation and inference have been proposed.They were developed to deal with large-scale statistical optimization problems.This paper aims to provide a comprehensive review for related literature.It includes parametric models,nonparametric models,and other frequently used models.Their key ideas and theoretical properties are summarized.The trade-off between communication cost and estimate precision together with other concerns is discussed.展开更多
基金supported in part by the Core Electronic Devices, High-End Generic Chips, and Basic Software Major Special Projects (No. 2018ZX01028101)the National Natural Science Foundation of China (Nos. 61907034, 61932001, and 61906200)。
文摘The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems.However,the communication overhead is a major bottleneck that hampers the scalability of distributed machine learning systems.In this paper,we design two communication-efficient algorithms for distributed learning tasks.The first one is named EF-SIGNGD,in which we use the 1-bit(sign-based) gradient quantization method to save the communication bits.Moreover,the error feedback technique,i.e.,incorporating the error made by the compression operator into the next step,is employed for the convergence guarantee.The second algorithm is called LE-SIGNGD,in which we introduce a well-designed lazy gradient aggregation rule to EF-SIGNGD that can detect the gradients with small changes and reuse the outdated information.LE-SIGNGD saves communication costs both in transmitted bits and communication rounds.Furthermore,we show that LE-SIGNGD is convergent under some mild assumptions.The effectiveness of the two proposed algorithms is demonstrated through experiments on both real and synthetic data.
基金supported by the National Natural Science Foundation of China[Grant Number 11771268].
文摘Zero-inflated count outcomes are common in many studies,such as counting claim frequency in the insurance industry in which identifying and understanding excessive zeros are of interest.Moreover,with the progress of data collecting and storage techniques,the amount of data is too massive to be stored or processed by a single node or branch.Hence,to develop distributed data analysis is blossoming.In this paper,several communication-efficient distributed zero-inflated Poisson regression algorithms are developed to analyse such kind of large-scale zero-inflated data.Both asymptotic properties of the proposed estimators and algorithm complexities are well studied and conducted.Various simulation studies demonstrate that our proposed method and algorithm work well and efficiently.Finally,in the case study,we apply our proposed algorithms to car insurance data from Kaggle.
文摘The forthcoming sixth generation(6G)of mobile communication networks is envisioned to be AInative,supporting intelligent services and pervasive computing at unprecedented scale.Among the key paradigms enabling this vision,Federated Learning(FL)has gained prominence as a distributed machine learning framework that allows multiple devices to collaboratively train models without sharing raw data,thereby preserving privacy and reducing the need for centralized storage.This capability is particularly attractive for vision-based applications,where image and video data are both sensitive and bandwidth-intensive.However,the integration of FL with 6G networks presents unique challenges,including communication bottlenecks,device heterogeneity,and trade-offs between model accuracy,latency,and energy consumption.In this paper,we developed a simulation-based framework to investigate the performance of FL in representative vision tasks under 6G-like environments.We formalize the system model,incorporating both the federated averaging(FedAvg)training process and a simplified communication costmodel that captures bandwidth constraints,packet loss,and variable latency across edge devices.Using standard image datasets(e.g.,MNIST,CIFAR-10)as benchmarks,we analyze how factors such as the number of participating clients,degree of data heterogeneity,and communication frequency influence convergence speed and model accuracy.Additionally,we evaluate the effectiveness of lightweight communication-efficient strategies,including local update tuning and gradient compression,in mitigating network overhead.The experimental results reveal several key insights:(i)communication limitations can significantly degrade FL convergence in vision tasks if not properly addressed;(ii)judicious tuning of local training epochs and client participation levels enables notable improvements in both efficiency and accuracy;and(iii)communication-efficient FL strategies provide a promising pathway to balance performance with the stringent latency and reliability requirements expected in 6G.These findings highlight the synergistic role of AI and nextgeneration networks in enabling privacy-preserving,real-time vision applications,and they provide concrete design guidelines for researchers and practitioners working at the intersection of FL and 6G.
基金This work is supported by National Natural Science Foun-dation of China(No.11971171)the 111 Project(B14019)and Project of National Social Science Fund of China(15BTJ027)+3 种基金Weidong Liu’s research is supported by National Program on Key Basic Research Project(973 Program,2018AAA0100704)National Natural Science Foundation of China(No.11825104,11690013)Youth Talent Sup-port Program,and a grant from Australian Research Council.Hansheng Wang’s research is partially supported by National Natural Science Foundation of China(No.11831008,11525101,71532001)It is also supported in part by China’s National Key Research Special Program(No.2016YFC0207704).
文摘The rapid emergence of massive datasets in various fields poses a serious challenge to tra-ditional statistical methods.Meanwhile,it provides opportunities for researchers to develop novel algorithms.Inspired by the idea of divide-and-conquer,various distributed frameworks for statistical estimation and inference have been proposed.They were developed to deal with large-scale statistical optimization problems.This paper aims to provide a comprehensive review for related literature.It includes parametric models,nonparametric models,and other frequently used models.Their key ideas and theoretical properties are summarized.The trade-off between communication cost and estimate precision together with other concerns is discussed.