期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Accelerating Distributed Training of Large Concurrent-Branch Models through Bidirectional Pipeline Coordination
1
作者 Zan Zong Yuyang Chen +4 位作者 Qi Zhang Daming Zhao Jianjiang Li Yijun Jing Jidong Zhai 《Tsinghua Science and Technology》 2025年第6期2638-2652,共15页
Large models have been widely used in the field of neural language processing,information retrieving,etc.With the development of the large models,not only is the parameter scale increased,but the model architecture ha... Large models have been widely used in the field of neural language processing,information retrieving,etc.With the development of the large models,not only is the parameter scale increased,but the model architecture has also become more complex.For example,the multi-modal transformer-based model mainly has concurrent branches,which we denoted as the concurrent branch model(CBM).Many CBMs have enlarged to tens of billions of parameters,and require distributed resources to train this kind of model.Existing distributed training systems cannot fully handle this type of model architecture because there are interactions between branches.Inspired by the unbalanced resource usage of pipeline parallelism,we prefer to organize different branches with a fine-grained bidirectional pipeline schedule of communication and computation.However,improper coordination between branches leads to idle time for computation and low training efficiency.In this paper,we present Flexpipe,a pipeline engine for c3oncurrent-branch models.We first introduce a branch-aware pipeline parallelism(BAPP)to make full use of the concurrent characteristic of the model architecture.Then,based on a multi-branch pipeline simulator,we propose an adaptive interaction coordinator,which facilitates the low-overhead branch interactions during the distributed model training.We evaluate our approach on popular concurrent branch models combined with modern training systems.Compared with the Chimera,the experiential results show that our method improves the end-to-end training throughput by 20%on average. 展开更多
关键词 parallel training system pipeline parallelism large model framework
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部