期刊文献+

一机多芯模块化服务器系统 被引量:1

Multi-Core Composable Modular Server System
在线阅读 下载PDF
导出
摘要 面向多样化应用场景需求和多元算力融合挑战,创新“一机多芯模块化服务器”软硬件体系结构。以服务器系统互连交换为中心,将多元计算单元和系统硬件资源进行解耦池化。通过标准化接口定义和统一控制与管理实现底层硬件差异化集成,实现多元算力协同、资源按需调配、系统统一调度与管理。关键技术包括高性能无阻塞总线互连交换、池化单元长距离低延时互连、内存和存储资源解耦池化、整机系统监控管理和系统资源拓扑管理等。一机多芯模块化服务器系统,可实现全部硬件解耦和弹性组合,实现在服务器系统内兼容多元算力模组以及多元算力和共享资源按需在线调配。实验结果显示一机多芯系统实现了均衡的16卡GPU低延时通信和系统性能线性提升,可面向AI场景实现异构算力按需分配;实现了亚微秒级远端内存访问,扩展了内存带宽和容量,有效提升系统性能;实现了细粒度存储池化资源共享,满足多主机高并发存储应用需求。 To meet requirements of various applications and challenges of heterogeneous computing,this paper defines the“Multi-Core Composable Modular Server System”.The innovation is to highlight the system interconnection as the center instead of the traditional CPU as the center,thus achieving heterogeneous computing and system resource pooling.The heterogeneous computing modules and system resources are disaggregated with standard interface definition,and are composed dynamically by system management software which embraces the diversity of computing modules and system resources.The modular server describes the system hardware and software structures,and explains the key design technology including heterogeneous multicore high-bandwidth interconnection,low-latency and long-range interconnection,memory and storage pooling architecture,system integrated management and resource dynamic allocation,etc.The testing results show that the multicore composable modular server can achieve equalized sixteen GPU peer-to-peer low-latency communications and linear system performance improvement,and supports heterogeneous computing dynamic allocation for AI applications as well.The system also enables low-latency remote memory access,which extends the memory bandwidth and capacity,and improves the system performance.The system can share the pooling storage in fine-grained slices,which meets the requirement of multi-host high-concurrency storage access.
作者 高显扬 吴安 慈潭龙 李金锋 赵伟康 GAO Xianyang;WUAn;CI Tanlong;LI Jinfeng;ZHAOWeikang(IEIT Systems Co.,Ltd.,Jinan 250101,China)
出处 《计算机工程与应用》 北大核心 2025年第5期344-354,共11页 Computer Engineering and Applications
关键词 一机多芯 模块化服务器 融合架构 硬件解耦 资源池化 异构算力 multicore modular server composable infrastructure disaggregated system resource pooling heterogeneous computing
  • 相关文献

参考文献4

二级参考文献24

  • 1赛迪顾问,2013-2014年中国信息消费市场研究年度报告[R],2014.
  • 2James Staten,Cristopher Voce,Galen Schreck,et al.Are Converged Infrastructures Good For IT?[R],2010.
  • 3张广彬,盘骏,曾智强.数据中心2013:硬件重构与软件定义[R],2014.
  • 4张东,亓开元,吴楠,等.云海大数据一体机体系结构和关键技术[C]//第二届CCF大数据学术会议,2014.
  • 5Barroso L A ,HSIzle U.The datacenter as a computer: An introduction to the design of warehouse-scale machines [J].Synthesis lectures on computer architecture,2009, 4(1): 1-108.
  • 6De Dobbelaere P. Silicon photonics technology platformfor embedded and integrated optical interconnect systems[C]//The 18th Asia and South Pacific Design Automation Conference,2013:644-647.
  • 7Bobda C.lntroduction to Reconfigurable Computing: Architectures[M].Springer,2007.
  • 8Ugo Russo,Andrea Redaelli,Roberto Bez. Non- Volatile Memory Technology Overview[C]//Workshop on Technology Architecture Inter-action.Saint-Malo France,2010.
  • 9Zaharia M,Chowdhury N M,Franklin M,et al.Spark: Cluster competing with working sets[C]//The 2nd USENIX conference on Hot topics in cloud computing. Boston,MA,USA,2010:1-10.
  • 10Quoc V Le,Marc Aurelio Ranzato,Rajat Monga,et al.Building High-level Features Using Large Scale Unsupervised Leaming[C]//The 29th Intemational Conference on Machine Learning. Edinburgh,Scotland, UK,2012.

共引文献47

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部