摘要
晶圆级计算机通过先进封装技术集成多芯粒,突破传统芯片面积限制实现算力扩展,但现存方案因领域专用化设计难以满足通用计算需求.面向高性能计算与智能计算场景的负载特征,提出一种新型通用化晶圆级系统架构——映天湖.首先通过解耦式计算模组-互连基板架构设计,结合标准化I/O接口支持多种计算模组;其次构建可重构晶上网络,采用动态拓扑重构技术适配不同业务流量模式;继而开发拓扑无关的容错控制,保障计算单元失效时的服务持续性.实验结果表明,所设计的可重构晶上网络可实现秒级拓扑切换时延.基于TSMC28nm工艺成功流片验证的16个计算模组的原型系统,在高性能线性代数计算任务中展现了约1.45倍的吞吐量提升,在深度学习推理任务中则展现约1.78倍的时延性能提升,单晶圆可实现千万亿次性能,证实该架构在实现晶圆级系统通用化方面的技术突破,为下一代异构计算平台提供了可扩展的硬件基础架构.
Wafer-scale computer integrates multiple chiplets through advanced packaging technologies,overcoming traditional chip area limitations to achieve computational power scaling.However,existing domain-specific designs struggle to meet generalized computing requirements.In this study,we propose Yingtian-Lake,which is a wafer-scale general-purpose computer targeting workload characteristics of high-performance computing and intelligent computing scenarios.First,a decoupled computing module-interposer architecture design with standardized I/O interfaces enables multi-modal computing module compatibility.Second,a reconfigurable wafer-scale network employing dynamic topology adaptation technology accommodates diverse traffic patterns.Third,a fault-aware tolerant routing algorithm ensures service continuity during computing unit failures.Experimental results demonstrate that the proposed reconfigurable network achieves second-level topology switching latency.The prototyped 16-module system fabricated with TSMC 28 nm process shows 1.45 times and 1.78 times energy efficiency improvements in high-performance linear algebra computations and deep learning inference tasks respectively,while delivering petaflops-level performance on a single wafer.This breakthrough architecture validates the technical feasibility of universal wafer-scale systems,establishing a scalable hardware foundation for next-generation heterogeneous computing platforms.
作者
董文阔
殷春锁
张志锰
王鹏超
沙江
王梦雅
朱旻琦
刘宏伟
刘宇航
郝沁汾
Dong Wenkuo;Yin Chunsuo;Zhang Zhimeng;Wang Pengchao;Sha Jiang;Wang Mengya;Zhu Minqi;Liu Hongwei;Liu Yuhang;Hao Qinfen(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;58th Research Institute,China Electronics Technology Group Corporation,Wuxi,Jiangsu,214035;Wuxi Institute of Interconnect Technology,Wuxi,Jiangsu,214131)
出处
《计算机研究与发展》
北大核心
2025年第6期1492-1512,共21页
Journal of Computer Research and Development
基金
国家重点研发计划项目(2022YFB4401501)
江苏省创新支撑计划(软科学研究)专项(BE2023006-4)。
关键词
晶圆级计算机
高性能计算
智能计算
标准化I/O设计
可重构晶上网络
wafer-scale computer
high-performance computing
intelligent computing
standardized I/O design
reconfigurable wafer-scale network