General-purpose edge neural networks need a lightweight architecture that effectively balances storage and computing resources.However,SRAM-based computing-in-memory(CIM)architectures face challenges in delivering ade...General-purpose edge neural networks need a lightweight architecture that effectively balances storage and computing resources.However,SRAM-based computing-in-memory(CIM)architectures face challenges in delivering adequate on-chip storage while fulfilling computing requirements.To overcome this,we introduce a new MRAM-based near-memory computing(NMC)architecture.It retains the costeffective data access benefits of CIM while separating storage and computing at the macro-level,improving deployment adaptability.We refine the NMC macro by incorporating small temporary storage and adopting a layer-fusion approach to enhance data-transfer efficiency.By integrating a high-capacity MRAM into the macro,we attain a storage density of 0.532 um2/bit.Moreover,we enhance the adder tree with a shift module,supporting multiply-and-accumulate(MAC)operations at five distinct depths(8,9,16,32,and 64),which raises resource utilization efficiency to 88.3%.Our architecture achieves an on-chip storage density of 1.49 Mb/mm2 and an energy efficiency of 6.164 TOPS/W.展开更多
基金supported in part by the Open Project Program of Anhui Province Key Laboratory of Spintronic Chip Research and Manufacturing under Grant WNKFKT-25-01in part by the National Science Foundation of China under Grant 62104025in part by the State Key Laboratory of Computer Architecture(ICT,CAS)under Grant CLQ202305.
文摘General-purpose edge neural networks need a lightweight architecture that effectively balances storage and computing resources.However,SRAM-based computing-in-memory(CIM)architectures face challenges in delivering adequate on-chip storage while fulfilling computing requirements.To overcome this,we introduce a new MRAM-based near-memory computing(NMC)architecture.It retains the costeffective data access benefits of CIM while separating storage and computing at the macro-level,improving deployment adaptability.We refine the NMC macro by incorporating small temporary storage and adopting a layer-fusion approach to enhance data-transfer efficiency.By integrating a high-capacity MRAM into the macro,we attain a storage density of 0.532 um2/bit.Moreover,we enhance the adder tree with a shift module,supporting multiply-and-accumulate(MAC)operations at five distinct depths(8,9,16,32,and 64),which raises resource utilization efficiency to 88.3%.Our architecture achieves an on-chip storage density of 1.49 Mb/mm2 and an energy efficiency of 6.164 TOPS/W.