期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Chair's Introduction to 2009 IEEE Circuits and Systems International Conference on Testing and Diagnosis
1
作者 Rueywen Liu 《Journal of Electronic Science and Technology of China》 2009年第4期289-289,共1页
Based on the recommendation of ICTD'09 TPC members, this Special Issue of the Journal of Electronic Science & Technology of China (JESTC) contained 22 high quality papers selected from the Proceedings of 2009 IEEE... Based on the recommendation of ICTD'09 TPC members, this Special Issue of the Journal of Electronic Science & Technology of China (JESTC) contained 22 high quality papers selected from the Proceedings of 2009 IEEE Circuits and Systems International Conference on Testing and Diagnosis (ICTD '09) which is fully sponsored by the IEEE Circuits and Systems Society (CASS), and is technically co-sponsored by the University of Electronic Science and Technology of China (UESTC), the Chinese Institute of Electronics (CIE), the China Instrument & Control Society (CIS), and organized by UESTC. 展开更多
关键词 IEEE this Chair’s Introduction to 2009 IEEE circuits and systems International Conference on Testing and Diagnosis
在线阅读 下载PDF
SALTM:Accelerating Large Transformers in Multi-Device System With 2-D Model Partitioning Method
2
作者 YIFAN SONG YULONG MENG +2 位作者 BINHAN CHEN SONG CHEN YI KANG 《Integrated Circuits and Systems》 2024年第3期144-156,共13页
Recently,large Transformer models have achieved impressive results in various natural language processing tasks but require enormous parameters and intensive computations,necessitating deployment on multi-device syste... Recently,large Transformer models have achieved impressive results in various natural language processing tasks but require enormous parameters and intensive computations,necessitating deployment on multi-device systems.Current solutions introduce complicated topologies with dedicated high-bandwidth interconnects to reduce communication overhead.To deal with the complexity problem in system architecture and reduce the overhead of inter-device communications,this paper proposes SALTM,a multi-device system based on a unidirectional ring topology and a 2-D model partitioning method considering quantization and pruning.First,a 1-D model partitioning method is proposed to reduce the amount of communication.Then,the block distributed on each device is further partitioned in the orthogonal direction,introducing a task-level pipeline to overlap communication and computation.To further explore the SALTM’s performance on a real large model like GPT-3,we develop an analytical model to evaluate the performance and communication overhead.Our simulation shows that a BERT model with 110 million parameters,implemented by SALTM on four FPGAs can achieve 9.65×and 1.12×speedups compared to CPU and GPU,respectively.The simulation also shows that the execution time of 4-FPGA SALTM is 1.52×that of an ideal system with infinite inter-device bandwidth.For GPT-3 with 175 billion parameters,our analytical model predicts that SALTM comprising 16 VC1502 FPGAs and 16 A30 GPUs can achieve inference latency of 287 ms and 164 ms,respectively. 展开更多
关键词 circuits and systems computer architecture field programmable gate arrays neural network hardware parallel architectures
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部