摘要
随着大模型参数量突破万亿规模,智算互联面临超大规模组网、低时延通信、高带宽同步等技术挑战。研究构建了包含吞吐量、时延、扩展比等指标的多维评价体系,分析了大模型训练、人工智能(artificial intelligence,AI)推理和边缘计算三大应用场景的需求特点。通过对比主流科技企业的解决方案,总结了CLOS架构、Fat-Tree拓扑等创新实践,重点探讨了互联协议、网络拓扑、拥塞控制等关键技术,并展望了开放协议、光电融合等未来发展方向。研究表明,智算互联技术的持续创新将为AI发展提供关键基础设施支撑。
As model parameters surpass the trillion-scale mark,intelligent computing interconnection faces technical challenges including ultra-large-scale networking,low-latency communication,and high-bandwidth synchronization multidimensional evaluation.The framework incorporating key metrics such as throughput,latency,and scaling ratio was established,the distinctive requirements of three major application scenarios:large-scale model training,AI inference,and edge computing,were analyzed.Through comparative analysis of solutions from leading technology enterprises,innovative practices were summarized including CLOS architecture and Fat-Tree topology,with discussions focused on critical technologies like interconnection protocols,network topologies,and congestion control.Future development directions such as open protocols and optoelectronic integration were also outlined.The findings demonstrate that continuous innovation in intelligent computing interconnection technologies will provide crucial infrastructure support for AI development.
作者
张云勇
闫硕
陈永铭
张启明
ZHANG Yunyong;YAN Shuo;CHEN Yongming;ZHANG Qiming(Yunnan Branch of China United Network Communications Co.,Ltd.,Kunming 650206,China;Zhongxing Telecommunication Equipment Corporation,Kunming 650034,China)
出处
《电信科学》
北大核心
2025年第8期22-32,共11页
Telecommunications Science
关键词
人工智能
智算互联
大模型训练
网络拓扑
光电融合
AI
intelligent computing interconnection
large-scale model training
network topology
optoelectronic integration