期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Adaptive Load Balancing for Parameter Servers in Distributed Machine Learning over Heterogeneous Networks 被引量:1
1
作者 CAI Weibo YANG Shulin +2 位作者 SUN Gang ZHANG Qiming YU Hongfang 《ZTE Communications》 2023年第1期72-80,共9页
In distributed machine learning(DML)based on the parameter server(PS)architecture,unbalanced communication load distribution of PSs will lead to a significant slowdown of model synchronization in heterogeneous network... In distributed machine learning(DML)based on the parameter server(PS)architecture,unbalanced communication load distribution of PSs will lead to a significant slowdown of model synchronization in heterogeneous networks due to low utilization of bandwidth.To address this problem,a network-aware adaptive PS load distribution scheme is proposed,which accelerates model synchronization by proactively adjusting the communication load on PSs according to network states.We evaluate the proposed scheme on MXNet,known as a realworld distributed training platform,and results show that our scheme achieves up to 2.68 times speed-up of model training in the dynamic and heterogeneous network environment. 展开更多
关键词 distributed machine learning network awareness parameter server load distribution heterogeneous network
在线阅读 下载PDF
DRPS:efficient disk-resident parameter servers for distributed machine learning
2
作者 Zhen Song Yu Gu +1 位作者 Zhigang Wang Ge Yu 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第4期79-90,共12页
Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations.... Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations.With memory constraints,machine learning(ML)developers cannot train large-scale ML models in their rather small local clusters.Moreover,renting large-scale cloud servers is always economically infeasible for research teams and small companies.In this paper,we propose a disk-resident parameter server system named DRPS,which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk.To further improve the performance of DRPS,we build an efficient index structure for parameters to reduce the disk I/O cost.Based on this index structure,we propose a novel multi-objective partitioning algorithm for the parameters.Finally,a flexible workerselection parallel model of computation(WSP)is proposed to strike a right balance between the problem of inconsistent parameter versions(staleness)and that of inconsistent execution progresses(straggler).Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS. 展开更多
关键词 parameter servers machine learning disk resident parallel model
原文传递
Impact of data set noise on distributed deep learning
3
作者 Guo Qinghao Shuai Liguo Hu Sunying 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2020年第2期37-45,共9页
The training efficiency and test accuracy are important factors in judging the scalability of distributed deep learning.In this dissertation,the impact of noise introduced in the mixed national institute of standards ... The training efficiency and test accuracy are important factors in judging the scalability of distributed deep learning.In this dissertation,the impact of noise introduced in the mixed national institute of standards and technology database(MNIST)and CIFAR-10 datasets is explored,which are selected as benchmark in distributed deep learning.The noise in the training set is manually divided into cross-noise and random noise,and each type of noise has a different ratio in the dataset.Under the premise of minimizing the influence of parameter interactions in distributed deep learning,we choose a compressed model(SqueezeNet)based on the proposed flexible communication method.It is used to reduce the communication frequency and we evaluate the influence of noise on distributed deep training in the synchronous and asynchronous stochastic gradient descent algorithms.Focusing on the experimental platform TensorFlowOnSpark,we obtain the training accuracy rate at different noise ratios and the training time for different numbers of nodes.The existence of cross-noise in the training set not only decreases the test accuracy and increases the time for distributed training.The noise has positive effect on destroying the scalability of distributed deep learning. 展开更多
关键词 distributed deep learning stochastic gradient descent parameter server(PS) dataset noise
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部