Computing-in-memory(CIM)has been a promising candidate for artificial-intelligent applications thanks to the absence of data transfer between computation and storage blocks.Resistive random access memory(RRAM)based CI...Computing-in-memory(CIM)has been a promising candidate for artificial-intelligent applications thanks to the absence of data transfer between computation and storage blocks.Resistive random access memory(RRAM)based CIM has the advantage of high computing density,non-volatility as well as high energy efficiency.However,previous CIM research has predominantly focused on realizing high energy efficiency and high area efficiency for inference,while little attention has been devoted to addressing the challenges of on-chip programming speed,power consumption,and accuracy.In this paper,a fabri-cated 28 nm 576K RRAM-based CIM macro featuring optimized on-chip programming schemes is proposed to address the issues mentioned above.Different strategies of mapping weights to RRAM arrays are compared,and a novel direct-current ADC design is designed for both programming and inference stages.Utilizing the optimized hybrid programming scheme,4.67×programming speed,0.15×power saving and 4.31×compact weight distribution are realized.Besides,this macro achieves a normalized area efficiency of 2.82 TOPS/mm2 and a normalized energy efficiency of 35.6 TOPS/W.展开更多
In recent years,physical unclonable function(PUF)has emerged as a lightweight solution in the Internet of Things security.However,conventional PUFs based on complementary metal oxide semiconductor(CMOS)present challen...In recent years,physical unclonable function(PUF)has emerged as a lightweight solution in the Internet of Things security.However,conventional PUFs based on complementary metal oxide semiconductor(CMOS)present challenges such as insufficient randomness,significant power and area overhead,and vulnerability to environmental factors,leading to reduced reliability.In this study,we realize a strong,highly reliable and reconfigurable PUF with resistance against machine-learning attacks in a 1 kb spinorbit torque magnetic random access memory fabricated using a 180 nm CMOS process.This strong PUF achieves a challenge-response pair capacity of 10^(9) through a computing-in-memory approach.The results demonstrate that the proposed PUF exhibits near-ideal performance metrics:50.07% uniformity,50% diffuseness,49.89% uniqueness,and a bit error rate of 0%,even in a 375 K environment.The reconfigurability of PUF is demonstrated by a reconfigurable Hamming distance of 49.31% and a correlation coefficient of less than 0.2,making it difficult to extract output keys through side-channel analysis.Furthermore,resistance to machine-learning modeling attacks is confirmed by achieving an ideal accuracy prediction of approximately 50% in the test set.展开更多
With the rapid development of machine learning,the demand for high-efficient computing becomes more and more urgent.To break the bottleneck of the traditional Von Neumann architecture,computing-in-memory(CIM)has attra...With the rapid development of machine learning,the demand for high-efficient computing becomes more and more urgent.To break the bottleneck of the traditional Von Neumann architecture,computing-in-memory(CIM)has attracted increasing attention in recent years.In this work,to provide a feasible CIM solution for the large-scale neural networks(NN)requiring continuous weight updating in online training,a flash-based computing-in-memory with high endurance(10^(9) cycles)and ultrafast programming speed is investigated.On the one hand,the proposed programming scheme of channel hot electron injection(CHEI)and hot hole injection(HHI)demonstrate high linearity,symmetric potentiation,and a depression process,which help to improve the training speed and accuracy.On the other hand,the low-damage programming scheme and memory window(MW)optimizations can suppress cell degradation effectively with improved computing accuracy.Even after 109 cycles,the leakage current(I_(off))of cells remains sub-10pA,ensuring the large-scale computing ability of memory.Further characterizations are done on read disturb to demonstrate its robust reliabilities.By processing CIFAR-10 tasks,it is evident that~90%accuracy can be achieved after 109 cycles in both ResNet50 and VGG16 NN.Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training,which pave the way for further development of artificial intelligence(AI)accelerators.展开更多
The growth of data and Internet of Things challenges traditional hardware,which encounters efficiency and power issues owing to separate functional units for sensors,memory,and computation.In this study,we designed an...The growth of data and Internet of Things challenges traditional hardware,which encounters efficiency and power issues owing to separate functional units for sensors,memory,and computation.In this study,we designed an a-phase indium selenide(a-In_(2)Se_(3))transistor,which is a two-dimensional ferroelectric semiconductor as the channel material,to create artificial optic-neural and electro-neural synapses,enabling cutting-edge processing-in-sensor(PIS)and computing-in-memory(CIM)functionalities.As an optic-neural synapse for low-level sensory processing,the a-In_(2)Se_(3)transistor exhibits a high photoresponsivity(2855 A/W)and detectivity(2.91×10^(14)Jones),facilitating efficient feature extraction.For high-level processing tasks as an electro-neural synapse,it offers a fast program/erase speed of 40 ns/50μs and ultralow energy consumption of 0.37 aJ/spike.An AI vision system using a-In_(2)Se_(3)transistors has been demonstrated.It achieved an impressive recognition accuracy of 92.63%within 12 epochs owing to the synergistic combination of the PIS and CIM functionalities.This study demonstrates the potential of the a-In_(2)Se_(3)transistor in future vision hardware,enhancing processing,power efficiency,and AI applications.展开更多
SRAM-based computing-in-memory(SRAM-CIM)is expected to solve the“Memory Wall”problem.For the digital domain SRAM-CIM,full-precision digital logic has been utilized to achieve high computational accuracy.However,the ...SRAM-based computing-in-memory(SRAM-CIM)is expected to solve the“Memory Wall”problem.For the digital domain SRAM-CIM,full-precision digital logic has been utilized to achieve high computational accuracy.However,the energy and area efficiency advantages of CIM cannot be fully utilized under error-resilient neural networks(NNs)with given quantization bit-width.Therefore,an all-digital Bit-wise Approximate compressor configurable In-SRAM-computing macro for Energy-efficient NN acceleration,with a data-aware weight Remapping method(BASER),is proposed in this paper.Leveraging the NN error resilience property,six energy-efficient bit-wise compressor configurations are presented under 4b/4b and 3b/3b NN quantization,respectively.Concurrently,a data-aware weight remapping approach is proposed to enhance the NN accuracy without supplementary retraining further.Evaluations of VGG-9 and ResNet-18 on CIFAR-10 and CIFAR-100 datasets show that the proposed BASER achieves 1.35x and 1.29x improvement in energy efficiency,as well as limited accuracy loss and improved NN accuracy,as compared to the previous full-precision and approximate SRAM-CIM design,respectively.展开更多
基金supported in part by the National Natural Science Foundation of China (62422405, 62025111,62495100, 92464302)the STI 2030-Major Projects(2021ZD0201200)+1 种基金the Shanghai Municipal Science and Technology Major Projectthe Beijing Advanced Innovation Center for Integrated Circuits
文摘Computing-in-memory(CIM)has been a promising candidate for artificial-intelligent applications thanks to the absence of data transfer between computation and storage blocks.Resistive random access memory(RRAM)based CIM has the advantage of high computing density,non-volatility as well as high energy efficiency.However,previous CIM research has predominantly focused on realizing high energy efficiency and high area efficiency for inference,while little attention has been devoted to addressing the challenges of on-chip programming speed,power consumption,and accuracy.In this paper,a fabri-cated 28 nm 576K RRAM-based CIM macro featuring optimized on-chip programming schemes is proposed to address the issues mentioned above.Different strategies of mapping weights to RRAM arrays are compared,and a novel direct-current ADC design is designed for both programming and inference stages.Utilizing the optimized hybrid programming scheme,4.67×programming speed,0.15×power saving and 4.31×compact weight distribution are realized.Besides,this macro achieves a normalized area efficiency of 2.82 TOPS/mm2 and a normalized energy efficiency of 35.6 TOPS/W.
基金supported by the National Natural Science Foundation of China(92164206,52261145694,T2394474,T2394470,623B2015,62271026,62401026,and 62404013)the National Key Research and Development Program of China(2022YFB4400200)+1 种基金the New Cornerstone Science Foundation through the XPLORER PRIZE,the National Postdoctoral Program for Innovative Talents(BX20220374 and BX20240455)the China Postdoctoral Science Foundation Funded Project(2023M740177 and 2022M720345).
文摘In recent years,physical unclonable function(PUF)has emerged as a lightweight solution in the Internet of Things security.However,conventional PUFs based on complementary metal oxide semiconductor(CMOS)present challenges such as insufficient randomness,significant power and area overhead,and vulnerability to environmental factors,leading to reduced reliability.In this study,we realize a strong,highly reliable and reconfigurable PUF with resistance against machine-learning attacks in a 1 kb spinorbit torque magnetic random access memory fabricated using a 180 nm CMOS process.This strong PUF achieves a challenge-response pair capacity of 10^(9) through a computing-in-memory approach.The results demonstrate that the proposed PUF exhibits near-ideal performance metrics:50.07% uniformity,50% diffuseness,49.89% uniqueness,and a bit error rate of 0%,even in a 375 K environment.The reconfigurability of PUF is demonstrated by a reconfigurable Hamming distance of 49.31% and a correlation coefficient of less than 0.2,making it difficult to extract output keys through side-channel analysis.Furthermore,resistance to machine-learning modeling attacks is confirmed by achieving an ideal accuracy prediction of approximately 50% in the test set.
基金This work was supported by the National Natural Science Foundation of China(Nos.62034006,92264201,and 91964105)the Natural Science Foundation of Shandong Province(Nos.ZR2020JQ28 and ZR2020KF016)the Program of Qilu Young Scholars of Shandong University.
文摘With the rapid development of machine learning,the demand for high-efficient computing becomes more and more urgent.To break the bottleneck of the traditional Von Neumann architecture,computing-in-memory(CIM)has attracted increasing attention in recent years.In this work,to provide a feasible CIM solution for the large-scale neural networks(NN)requiring continuous weight updating in online training,a flash-based computing-in-memory with high endurance(10^(9) cycles)and ultrafast programming speed is investigated.On the one hand,the proposed programming scheme of channel hot electron injection(CHEI)and hot hole injection(HHI)demonstrate high linearity,symmetric potentiation,and a depression process,which help to improve the training speed and accuracy.On the other hand,the low-damage programming scheme and memory window(MW)optimizations can suppress cell degradation effectively with improved computing accuracy.Even after 109 cycles,the leakage current(I_(off))of cells remains sub-10pA,ensuring the large-scale computing ability of memory.Further characterizations are done on read disturb to demonstrate its robust reliabilities.By processing CIFAR-10 tasks,it is evident that~90%accuracy can be achieved after 109 cycles in both ResNet50 and VGG16 NN.Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training,which pave the way for further development of artificial intelligence(AI)accelerators.
基金supported by the National Natural Science Foundation of China(62104066,52221001,62090035,U19A2090,U22A20138,52372146,and 62101181)the National Key R&D Program of China(2022YFA1402501,2022YFA1204300)+6 种基金the Natural Science Foundation of Hunan Province(2021JJ20016)the Science and Technology Innovation Program of Hunan Province(2021RC3061)the Key Program of Science and Technology Department of Hunan Province(2019XK2001,2020XK2001)the Open Project Program of Wuhan National Laboratory for Optoelectronics(2020WNLOKF016)the Open Project Program of Key Laboratory of Nanodevices and Applications,Suzhou Institute of Nano-Tech and Nano-Bionics,Chinese Academy of Sciences(22ZS01)the Project funded by China Postdoctoral Science Foundation(2023TQ0110)the Innovation Project of Optics Valley Laboratory(OVL2023ZD002).
文摘The growth of data and Internet of Things challenges traditional hardware,which encounters efficiency and power issues owing to separate functional units for sensors,memory,and computation.In this study,we designed an a-phase indium selenide(a-In_(2)Se_(3))transistor,which is a two-dimensional ferroelectric semiconductor as the channel material,to create artificial optic-neural and electro-neural synapses,enabling cutting-edge processing-in-sensor(PIS)and computing-in-memory(CIM)functionalities.As an optic-neural synapse for low-level sensory processing,the a-In_(2)Se_(3)transistor exhibits a high photoresponsivity(2855 A/W)and detectivity(2.91×10^(14)Jones),facilitating efficient feature extraction.For high-level processing tasks as an electro-neural synapse,it offers a fast program/erase speed of 40 ns/50μs and ultralow energy consumption of 0.37 aJ/spike.An AI vision system using a-In_(2)Se_(3)transistors has been demonstrated.It achieved an impressive recognition accuracy of 92.63%within 12 epochs owing to the synergistic combination of the PIS and CIM functionalities.This study demonstrates the potential of the a-In_(2)Se_(3)transistor in future vision hardware,enhancing processing,power efficiency,and AI applications.
基金supported in part by the National Key R&D Program of China under Grant 2023YFB450220in part by the National Natural Science Foundation of China under Grant 62174110 and Grant 62104025in part by the Natural Science Foundation of Shanghai under Grant 23ZR1433200.
文摘SRAM-based computing-in-memory(SRAM-CIM)is expected to solve the“Memory Wall”problem.For the digital domain SRAM-CIM,full-precision digital logic has been utilized to achieve high computational accuracy.However,the energy and area efficiency advantages of CIM cannot be fully utilized under error-resilient neural networks(NNs)with given quantization bit-width.Therefore,an all-digital Bit-wise Approximate compressor configurable In-SRAM-computing macro for Energy-efficient NN acceleration,with a data-aware weight Remapping method(BASER),is proposed in this paper.Leveraging the NN error resilience property,six energy-efficient bit-wise compressor configurations are presented under 4b/4b and 3b/3b NN quantization,respectively.Concurrently,a data-aware weight remapping approach is proposed to enhance the NN accuracy without supplementary retraining further.Evaluations of VGG-9 and ResNet-18 on CIFAR-10 and CIFAR-100 datasets show that the proposed BASER achieves 1.35x and 1.29x improvement in energy efficiency,as well as limited accuracy loss and improved NN accuracy,as compared to the previous full-precision and approximate SRAM-CIM design,respectively.