New electronic devices based on the physical properties of electrically driven skyrmions are promising for logic computing and nonvolatile memory applications.However,achieving efficient and practical compute-storage ...New electronic devices based on the physical properties of electrically driven skyrmions are promising for logic computing and nonvolatile memory applications.However,achieving efficient and practical compute-storage integration remains challenging owing to the structural complexity,limited functionality,and low flexibility observed in most skyrmion-based devices.In this study,we designed a novel device architecture that integrates seven basic logic gates into a unified physical structure.Their operation can be enabled by physical mechanisms,such as spin-orbit torque,spin-transfer torque,skyrmion-edge repulsions,and skyrmion-skyrmion interactions.Furthermore,by incorporating voltage-controlled magnetic anisotropy,the device achieved multi-input capability and reconfigurability functionality.Ultralow power consumption(<1 fJ/bit per logic function)and extremely high logic density were achieved.Significantly,the compatibility of this nanotrack design with existing skyrmion racetrack memory paves the way for advanced in-memory computing in spintronic architectures.展开更多
Layer pseudospins,exhibiting quantum coherence and precise multistate controllability,present significant potential for the advancement of future computing technologies.In this work,we propose an in-memory probabilist...Layer pseudospins,exhibiting quantum coherence and precise multistate controllability,present significant potential for the advancement of future computing technologies.In this work,we propose an in-memory probabilistic computing scheme based on the electrical manipulation of layer pseudospins in layered materials,by exploiting the interaction between real spins and layer pseudospins.展开更多
Photonic platforms are gradually emerging as a promising option to encounter the ever-growing demand for artificial intelligence,among which photonic time-delay reservoir computing(TDRC)is widely anticipated.While suc...Photonic platforms are gradually emerging as a promising option to encounter the ever-growing demand for artificial intelligence,among which photonic time-delay reservoir computing(TDRC)is widely anticipated.While such a computing paradigm can only employ a single photonic device as the nonlinear node for data processing,the performance highly relies on the fading memory provided by the delay feedback loop(FL),which sets a restriction on the extensibility of physical implementation,especially for highly integrated chips.Here,we present a simplified photonic scheme for more flexible parameter configurations leveraging the designed quasi-convolution coding(QC),which completely gets rid of the dependence on FL.Unlike delay-based TDRC,encoded data in QC-based RC(QRC)enables temporal feature extraction,facilitating augmented memory capabilities.Thus,our proposed QRC is enabled to deal with time-related tasks or sequential data without the implementation of FL.Furthermore,we can implement this hardware with a low-power,easily integrable vertical-cavity surface-emitting laser for high-performance parallel processing.We illustrate the concept validation through simulation and experimental comparison of QRC and TDRC,wherein the simpler-structured QRC outperforms across various benchmark tasks.Our results may underscore an auspicious solution for the hardware implementation of deep neural networks.展开更多
Resistive random-access memory(RRAM),also known as memristors,having a very simple device structure with two terminals,fulfill almost all of the fundamental requirements of volatile memory,nonvolatile memory,and neuro...Resistive random-access memory(RRAM),also known as memristors,having a very simple device structure with two terminals,fulfill almost all of the fundamental requirements of volatile memory,nonvolatile memory,and neuromorphic characteristics.Its memory and neuromorphic behaviors are currently being explored in relation to a range of materials,such as biological materials,perovskites,2D materials,and transition metal oxides.In this review,we discuss the different electrical behaviors exhibited by RRAM devices based on these materials by briefly explaining their corresponding switching mechanisms.We then discuss emergent memory technologies using memristors,together with its potential neuromorphic applications,by elucidating the different material engineering techniques used during device fabrication to improve the memory and neuromorphic performance of devices,in areas such as ION/IOFF ratio,endurance,spike time-dependent plasticity(STDP),and paired-pulse facilitation(PPF),among others.The emulation of essential biological synaptic functions realized in various switching materials,including inorganic metal oxides and new organic materials,as well as diverse device structures such as single-layer and multilayer hetero-structured devices,and crossbar arrays,is analyzed in detail.Finally,we discuss current challenges and future prospects for the development of inorganic and new materials-based memristors.展开更多
Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann arc...Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann architecture cannot meet the requirements of heavily datacentric applications due to the separation of computation and storage.The emergence of computing inmemory(CIM)is significant in circumventing the von Neumann bottleneck.A commercialized memory architecture,static random-access memory(SRAM),is fast and robust,consumes less power,and is compatible with state-of-the-art technology.This study investigates the research progress of SRAM-based CIM technology in three levels:circuit,function,and application.It also outlines the problems,challenges,and prospects of SRAM-based CIM macros.展开更多
In the past decade,there has been tremendous progress in integrating chalcogenide phase-change materials(PCMs)on the silicon photonic platform for non-volatile memory to neuromorphic in-memory computing applications.I...In the past decade,there has been tremendous progress in integrating chalcogenide phase-change materials(PCMs)on the silicon photonic platform for non-volatile memory to neuromorphic in-memory computing applications.In particular,these non von Neumann computational elements and systems benefit from mass manufacturing of silicon photonic integrated circuits(PICs)on 8-inch wafers using a 130 nm complementary metal-oxide semiconductor line.Chip manufacturing based on deep-ultraviolet lithography and electron-beam lithography enables rapid prototyping of PICs,which can be integrated with high-quality PCMs based on the wafer-scale sputtering technique as a back-end-of-line process.In this article,we present an overview of recent advances in waveguide integrated PCM memory cells,functional devices,and neuromorphic systems,with an emphasis on fabrication and integration processes to attain state-of-the-art device performance.After a short overview of PCM based photonic devices,we discuss the materials properties of the functional layer as well as the progress on the light guiding layer,namely,the silicon and germanium waveguide platforms.Next,we discuss the cleanroom fabrication flow of waveguide devices integrated with thin films and nanowires,silicon waveguides and plasmonic microheaters for the electrothermal switching of PCMs and mixed-mode operation.Finally,the fabrication of photonic and photonic–electronic neuromorphic computing systems is reviewed.These systems consist of arrays of PCM memory elements for associative learning,matrix-vector multiplication,and pattern recognition.With large-scale integration,the neuromorphic photonic computing paradigm holds the promise to outperform digital electronic accelerators by taking the advantages of ultra-high bandwidth,high speed,and energy-efficient operation in running machine learning algorithms.展开更多
The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bott...The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bottleneck.Although variations and instability in ultra-scaled memory cells seriously degrade the calculation accuracy in IMC architectures,stochastic computing(SC)can compensate for these shortcomings due to its low sensitivity to cell disturbances.Furthermore,massive parallel computing can be processed to improve the speed and efficiency of the system.In this paper,by designing logic functions in NOR flash arrays,SC in IMC for the image edge detection is realized,demonstrating ultra-low computational complexity and power consumption(25.5 fJ/pixel at 2-bit sequence length).More impressively,the noise immunity is 6 times higher than that of the traditional binary method,showing good tolerances to cell variation and reliability degradation when implementing massive parallel computation in the array.展开更多
Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with...Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with flexible structural unit,ultra-low power consumption,and huge parallelism will be needed.In-memory computing,a non-von Neumann architecture fusing memory units and computing units,can eliminate the data transfer time and energy consumption while performing massive parallel computations.Prototype in-memory computing schemes modified from different memory technologies have shown orders of magnitude improvement in computing efficiency,making it be regarded as the ultimate computing paradigm.Here we review the state-of-the-art memory device technologies potential for in-memory computing,summarize their versatile applications in neural network,stochastic generation,and hybrid precision digital computing,with promising solutions for unprecedented computing tasks,and also discuss the challenges of stability and integration for general in-memory computing.展开更多
Developing efficient neural network(NN)computing systems is crucial in the era of artificial intelligence(AI).Traditional von Neumann architectures have both the issues of"memory wall"and"power wall&quo...Developing efficient neural network(NN)computing systems is crucial in the era of artificial intelligence(AI).Traditional von Neumann architectures have both the issues of"memory wall"and"power wall",limiting the data transfer between memory and processing units[1,2].Compute-in-memory(CIM)technologies,particularly analogue CIM with memristor crossbars,are promising because of their high energy efficiency,computational parallelism,and integration density for NN computations[3].In practical applications,analogue CIM excels in tasks like speech recognition and image classification,revealing its unique advantages.For instance,it efficiently processes vast amounts of audio data in speech recognition,achieving high accuracy with minimal power consumption.In image classification,the high parallelism of analogue CIM significantly speeds up feature extraction and reduces processing time.With the boosting development of AI applications,the demands for computational accuracy and task complexity are rising continually.However,analogue CIM systems are limited in handling complex regression tasks with needs of precise floating-point(FP)calculations.They are primarily suited for the classification tasks with low data precision and a limited dynamic range[4].展开更多
As a typical in-memory computing hardware design, nonvolatile ternary content-addressable memories(TCAMs) enable the logic operation and data storage for high throughout in parallel big data processing. However,TCAM c...As a typical in-memory computing hardware design, nonvolatile ternary content-addressable memories(TCAMs) enable the logic operation and data storage for high throughout in parallel big data processing. However,TCAM cells based on conventional silicon-based devices suffer from structural complexity and large footprintlimitations. Here, we demonstrate an ultrafast nonvolatile TCAM cell based on the MoTe2/hBN/multilayergraphene (MLG) van der Waals heterostructure using a top-gated partial floating-gate field-effect transistor(PFGFET) architecture. Based on its ambipolar transport properties, the carrier type in the source/drain andcentral channel regions of the MoTe2 channel can be efficiently tuned by the control gate and top gate, respectively,enabling the reconfigurable operation of the device in either memory or FET mode. When working inthe memory mode, it achieves an ultrafast 60 ns programming/erase speed with a current on-off ratio of ∼105,excellent retention capability, and robust endurance. When serving as a reconfigurable transistor, unipolar p-typeand n-type FETs are obtained by adopting ultrafast 60 ns control-gate voltage pulses with different polarities.The monolithic integration of memory and logic within a single device enables the content-addressable memory(CAM) functionality. Finally, by integrating two PFGFETs in parallel, a TCAM cell with a high current ratioof ∼10^(5) between the match and mismatch states is achieved without requiring additional peripheral circuitry.These results provide a promising route for the design of high-performance TCAM devices for future in-memorycomputing applications.展开更多
Neuromorphic computing devices leveraging HfO_(2) and ZrO_(2) materials have recently garnered significant attention due to their potential for brain-inspired computing systems.In this study,we present a novel trilaye...Neuromorphic computing devices leveraging HfO_(2) and ZrO_(2) materials have recently garnered significant attention due to their potential for brain-inspired computing systems.In this study,we present a novel trilayer Pt/HfO_(2)/ZrO_(2-x)/HfO_(2)/TiN memristor,engineered with a ZrO_(2-x) oxygen vacancy reservoir(OVR)layer fabricated via radio frequency(RF)sputtering under controlled oxygen ambient.The incorporation of the ZrO_(2-x) OVR layer enables enhanced resistive switching characteristics,including a high ON/OFF ratio(∼8000),excellent uniformity,robust data retention(>105 s),and multilevel storage capabilities.Furthermore,the memristor demonstrates superior synaptic plasticity with linear long-term potentiation(LTP)and depression(LTD),achieving low non-linearity values of 1.36(LTP)and 0.66(LTD),and a recognition accuracy of 95.3%in an MNIST dataset simulation.The unique properties of the ZrO_(2-x) layer,particularly its ability to act as a dynamic oxygen vacancy reservoir,significantly enhance synaptic performance by stabilizing oxygen vacancy migration.These findings establish the OVR-trilayer memristor as a promising candidate for future neuromorphic computing and high-performance memory applications.展开更多
The resistive random access memory(RRAM)has stimulated a variety of promising applications including programmable analog circuit,massive data storage,neuromorphic computing,etc.These new emerging applications have hug...The resistive random access memory(RRAM)has stimulated a variety of promising applications including programmable analog circuit,massive data storage,neuromorphic computing,etc.These new emerging applications have huge demands on high integration density and low power consumption.The cross-point configuration or passive array,which offers the smallest footprint of cell size and feasible capability of multi-layer stacking,has received broad attention from the research community.In such array,correct operation of reading and writing on a cell relies on effective elimination of the sneaking current coming from the neighboring cells.This target requires nonlinear I-V characteristics of the memory cell,which can be realized by either adding separate selector or developing implicit build-in nonlinear cells.The performance of a passive array largely depends on the cell nonlinearity,reliability,on/off ratio,line resistance,thermal coupling,etc.This article provides a comprehensive review on the progress achieved concerning 3D RRAM integration.First,the authors start with a brief overview of the associative problems in passive array and the category of 3D architectures.Next,the state of the arts on the development of various selector devices and self-selective cells are presented.Key parameters that influence the device nonlinearity and current density are outlined according to the corresponding working principles.Then,the reliability issues in 3D array are summarized in terms of uniformity,endurance,retention,and disturbance.Subsequently,scaling issue and thermal crosstalk in 3D memory array are thoroughly discussed,and applications of 3D RRAM beyond storage,such as neuromorphic computing and CMOL circuit are discussed later.Summary and outlooks are given in the final.展开更多
Many fishes use undulatory fin to propel themselves in the underwater environment. These locomotor mechanisms have a popular interest to many researchers. In the present study, we perform a three-dimensional unsteady ...Many fishes use undulatory fin to propel themselves in the underwater environment. These locomotor mechanisms have a popular interest to many researchers. In the present study, we perform a three-dimensional unsteady computation of an undulatory mechanical fin that is driven by Shape Memory Alloy (SMA). The objective of the computation is to investigate the fluid dynamics of force production associated with the undulatory mechanical fin. An unstructured, grid-based, unsteady Navier-Stokes solver with automatic adaptive remeshing is used to compute the unsteady flow around the fin through five complete cycles. The pressure distribution on fin surface is computed and integrated to provide fin forces which are decomposed into lift and thrust. The velocity field is also computed throughout the swimming cycle. Finally, a comparison is conducted to reveal the dynamics of force generation according to the kinematic parameters of the undulatory fin (amplitude, frequency and wavelength).展开更多
The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that ...The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that developers can make ideal casting designs. To take the prior occupation at commercial simulation market, so many development groups in the world are doing their every effort. They already reported successful stories in manufacturing fields by developing and providing the high performance simulation technologies for multipurpose. But they all run at powerful desk-side computers by well-trained experts mainly, so that it is hard to diffuse the scientific designing concept to newcomers in casting field. To overcome upcoming problems in scientific casting designs, we utilized information technologies and full-matured hardware backbones to spread out the effective and scientific casting design mind, and they all were integrated into Simulation Portal on the web. It professes scientific casting design on the NET including ubiquitous access way represented by "Anyone, Anytime, Anywhere" concept for casting designs.展开更多
Traditional digital processing approaches are based on semiconductor transistors, which suffer from high power consumption, aggravating with technology node scaling. To solve definitively this problem, a number of eme...Traditional digital processing approaches are based on semiconductor transistors, which suffer from high power consumption, aggravating with technology node scaling. To solve definitively this problem, a number of emerging non-volatile nanodevices are under intense investigations. Meanwhile, novel computing circuits are invented to dig the full potential of the nanodevices. The combination of non-volatile nanodevices with suitable computing paradigms have many merits compared with the complementary metal-oxide-semiconductor transistor (CMOS) technology based structures, such as zero standby power, ultra-high density, non-volatility, and acceptable access speed. In this paper, we overview and compare the computing paradigms based on the emerging nanodevices towards ultra-low dissipation.展开更多
With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-per...With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-performance computing,require enhanced computing power.To meet this requirement,a non-uniform memory access(NUMA)configuration method is proposed for the cloud computing system according to the affinity,adaptability,and availability of the NUMA architecture processor platform.The proposed method is verified based on the test environment of a domestic central processing unit(CPU).展开更多
In 2004, Jeff Hawkins presented a memory-prediction theory of brain function, and later used it to create the Hierar-chical Temporal Memory model. Several of the concepts described in the theory are applied here in a ...In 2004, Jeff Hawkins presented a memory-prediction theory of brain function, and later used it to create the Hierar-chical Temporal Memory model. Several of the concepts described in the theory are applied here in a computer vision system for a mobile robot application. The aim was to produce a system enabling a mobile robot to explore its envi-ronment and recognize different types of objects without human supervision. The operator has means to assign names to the identified objects of interest. The system presented here works with time ordered sequences of images. It utilizes a tree structure of connected computational nodes similar to Hierarchical Temporal Memory and memorizes frequent sequences of events. The structure of the proposed system and the algorithms involved are explained. A brief survey of the existing algorithms applicable in the system is provided and future applications are outlined. Problems that can arise when the robot’s velocity changes are listed, and a solution is proposed. The proposed system was tested on a sequence of images recorded by two parallel cameras moving in a real world environment. Results for mono- and ste-reo vision experiments are presented.展开更多
As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge device...As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge devices.The execution speed of the deployed model is a key element to ensure service quality.Considering a highly heterogeneous edge deployment scenario,deep learning compiling is a novel approach that aims to solve this problem.It defines models using certain DSLs and generates efficient code implementations on different hardware devices.However,there are still two aspects that are not yet thoroughly investigated yet.The first is the optimization of memory-intensive operations,and the second problem is the heterogeneity of the deployment target.To that end,in this work,we propose a system solution that optimizes memory-intensive operation,optimizes the subgraph distribution,and enables the compiling and deployment of DNN models on multiple targets.The evaluation results show the performance of our proposed system.展开更多
In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based ...In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing.展开更多
基金support from the National Natural Science Foundation of China (Grant No.12474101)support from the National Natural Science Foundation of China (Grant Nos.52272202 and W2421027)support from the National Natural Science Foundation of China (Grant No.52501307)。
文摘New electronic devices based on the physical properties of electrically driven skyrmions are promising for logic computing and nonvolatile memory applications.However,achieving efficient and practical compute-storage integration remains challenging owing to the structural complexity,limited functionality,and low flexibility observed in most skyrmion-based devices.In this study,we designed a novel device architecture that integrates seven basic logic gates into a unified physical structure.Their operation can be enabled by physical mechanisms,such as spin-orbit torque,spin-transfer torque,skyrmion-edge repulsions,and skyrmion-skyrmion interactions.Furthermore,by incorporating voltage-controlled magnetic anisotropy,the device achieved multi-input capability and reconfigurability functionality.Ultralow power consumption(<1 fJ/bit per logic function)and extremely high logic density were achieved.Significantly,the compatibility of this nanotrack design with existing skyrmion racetrack memory paves the way for advanced in-memory computing in spintronic architectures.
基金supported by the National Natural Science Foundation of China(Grant Nos.12322407,62122036,and 62034004)the Natural Science Foundation of Jiangsu Province(Grant No.BK20233001)+5 种基金the National Key R&D Program of China(Grant Nos.2023YFF0718400 and 2023YFF1203600)the Leading-edge Technology Program of Jiangsu Natural Science Foundation(Grant No.BK20232004)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB44000000)Innovation Program for Quantum Science and Technologysupport from the Fundamental Research Funds for the Central Universities(Grant Nos.020414380227,020414380240,and 020414380242)the e-Science Center of Collaborative Innovation Center of Advanced Microstructures。
文摘Layer pseudospins,exhibiting quantum coherence and precise multistate controllability,present significant potential for the advancement of future computing technologies.In this work,we propose an in-memory probabilistic computing scheme based on the electrical manipulation of layer pseudospins in layered materials,by exploiting the interaction between real spins and layer pseudospins.
基金National Natural Science Foundation of China(62171305,62405206,62004135,62001317,62111530301)Natural Science Foundation of Jiangsu Province(BK20240778,BK20241917)+3 种基金State Key Laboratory of Advanced Optical Communication Systems and Networks,China(2023GZKF08)China Postdoctoral Science Foundation(2024M752314)Postdoctoral Fellowship Program of CPSF(GZC20231883)Innovative and Entrepreneurial Talent Program of Jiangsu Province(JSSCRC2021527).
文摘Photonic platforms are gradually emerging as a promising option to encounter the ever-growing demand for artificial intelligence,among which photonic time-delay reservoir computing(TDRC)is widely anticipated.While such a computing paradigm can only employ a single photonic device as the nonlinear node for data processing,the performance highly relies on the fading memory provided by the delay feedback loop(FL),which sets a restriction on the extensibility of physical implementation,especially for highly integrated chips.Here,we present a simplified photonic scheme for more flexible parameter configurations leveraging the designed quasi-convolution coding(QC),which completely gets rid of the dependence on FL.Unlike delay-based TDRC,encoded data in QC-based RC(QRC)enables temporal feature extraction,facilitating augmented memory capabilities.Thus,our proposed QRC is enabled to deal with time-related tasks or sequential data without the implementation of FL.Furthermore,we can implement this hardware with a low-power,easily integrable vertical-cavity surface-emitting laser for high-performance parallel processing.We illustrate the concept validation through simulation and experimental comparison of QRC and TDRC,wherein the simpler-structured QRC outperforms across various benchmark tasks.Our results may underscore an auspicious solution for the hardware implementation of deep neural networks.
基金Basic Science Research Program through the National Research Foundation of Korea(NRF),funded by the Ministry of Education(NRF-2019R1F1A1057243)together with the Future Semiconductor Device Technology Development Program(20003808,10080689,20004399)funded by MOTIE(Ministry of Trade,Industry&Energy)and KSRC(Korea Semiconductor Research Consortium).
文摘Resistive random-access memory(RRAM),also known as memristors,having a very simple device structure with two terminals,fulfill almost all of the fundamental requirements of volatile memory,nonvolatile memory,and neuromorphic characteristics.Its memory and neuromorphic behaviors are currently being explored in relation to a range of materials,such as biological materials,perovskites,2D materials,and transition metal oxides.In this review,we discuss the different electrical behaviors exhibited by RRAM devices based on these materials by briefly explaining their corresponding switching mechanisms.We then discuss emergent memory technologies using memristors,together with its potential neuromorphic applications,by elucidating the different material engineering techniques used during device fabrication to improve the memory and neuromorphic performance of devices,in areas such as ION/IOFF ratio,endurance,spike time-dependent plasticity(STDP),and paired-pulse facilitation(PPF),among others.The emulation of essential biological synaptic functions realized in various switching materials,including inorganic metal oxides and new organic materials,as well as diverse device structures such as single-layer and multilayer hetero-structured devices,and crossbar arrays,is analyzed in detail.Finally,we discuss current challenges and future prospects for the development of inorganic and new materials-based memristors.
基金the National Key Research and Development Program of China(2018YFB2202602)The State Key Program of the National Natural Science Foundation of China(NO.61934005)+1 种基金The National Natural Science Foundation of China(NO.62074001)Joint Funds of the National Natural Science Foundation of China under Grant U19A2074.
文摘Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann architecture cannot meet the requirements of heavily datacentric applications due to the separation of computation and storage.The emergence of computing inmemory(CIM)is significant in circumventing the von Neumann bottleneck.A commercialized memory architecture,static random-access memory(SRAM),is fast and robust,consumes less power,and is compatible with state-of-the-art technology.This study investigates the research progress of SRAM-based CIM technology in three levels:circuit,function,and application.It also outlines the problems,challenges,and prospects of SRAM-based CIM macros.
基金the support of the National Natural Science Foundation of China(Grant No.62204201)。
文摘In the past decade,there has been tremendous progress in integrating chalcogenide phase-change materials(PCMs)on the silicon photonic platform for non-volatile memory to neuromorphic in-memory computing applications.In particular,these non von Neumann computational elements and systems benefit from mass manufacturing of silicon photonic integrated circuits(PICs)on 8-inch wafers using a 130 nm complementary metal-oxide semiconductor line.Chip manufacturing based on deep-ultraviolet lithography and electron-beam lithography enables rapid prototyping of PICs,which can be integrated with high-quality PCMs based on the wafer-scale sputtering technique as a back-end-of-line process.In this article,we present an overview of recent advances in waveguide integrated PCM memory cells,functional devices,and neuromorphic systems,with an emphasis on fabrication and integration processes to attain state-of-the-art device performance.After a short overview of PCM based photonic devices,we discuss the materials properties of the functional layer as well as the progress on the light guiding layer,namely,the silicon and germanium waveguide platforms.Next,we discuss the cleanroom fabrication flow of waveguide devices integrated with thin films and nanowires,silicon waveguides and plasmonic microheaters for the electrothermal switching of PCMs and mixed-mode operation.Finally,the fabrication of photonic and photonic–electronic neuromorphic computing systems is reviewed.These systems consist of arrays of PCM memory elements for associative learning,matrix-vector multiplication,and pattern recognition.With large-scale integration,the neuromorphic photonic computing paradigm holds the promise to outperform digital electronic accelerators by taking the advantages of ultra-high bandwidth,high speed,and energy-efficient operation in running machine learning algorithms.
基金supported by the National Natural Science Foundation of China(Nos.62034006,91964105,61874068)the China Key Research and Development Program(No.2016YFA0201802)+1 种基金the Natural Science Foundation of Shandong Province(No.ZR2020JQ28)Program of Qilu Young Scholars of Shandong University。
文摘The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bottleneck.Although variations and instability in ultra-scaled memory cells seriously degrade the calculation accuracy in IMC architectures,stochastic computing(SC)can compensate for these shortcomings due to its low sensitivity to cell disturbances.Furthermore,massive parallel computing can be processed to improve the speed and efficiency of the system.In this paper,by designing logic functions in NOR flash arrays,SC in IMC for the image edge detection is realized,demonstrating ultra-low computational complexity and power consumption(25.5 fJ/pixel at 2-bit sequence length).More impressively,the noise immunity is 6 times higher than that of the traditional binary method,showing good tolerances to cell variation and reliability degradation when implementing massive parallel computation in the array.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61925402 and 61851402)Science and Technology Commission of Shanghai Municipality,China(Grant No.19JC1416600)+1 种基金the National Key Research and Development Program of China(Grant No.2017YFB0405600)Shanghai Education Development Foundation and Shanghai Municipal Education Commission Shuguang Program,China(Grant No.18SG01).
文摘Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with flexible structural unit,ultra-low power consumption,and huge parallelism will be needed.In-memory computing,a non-von Neumann architecture fusing memory units and computing units,can eliminate the data transfer time and energy consumption while performing massive parallel computations.Prototype in-memory computing schemes modified from different memory technologies have shown orders of magnitude improvement in computing efficiency,making it be regarded as the ultimate computing paradigm.Here we review the state-of-the-art memory device technologies potential for in-memory computing,summarize their versatile applications in neural network,stochastic generation,and hybrid precision digital computing,with promising solutions for unprecedented computing tasks,and also discuss the challenges of stability and integration for general in-memory computing.
文摘Developing efficient neural network(NN)computing systems is crucial in the era of artificial intelligence(AI).Traditional von Neumann architectures have both the issues of"memory wall"and"power wall",limiting the data transfer between memory and processing units[1,2].Compute-in-memory(CIM)technologies,particularly analogue CIM with memristor crossbars,are promising because of their high energy efficiency,computational parallelism,and integration density for NN computations[3].In practical applications,analogue CIM excels in tasks like speech recognition and image classification,revealing its unique advantages.For instance,it efficiently processes vast amounts of audio data in speech recognition,achieving high accuracy with minimal power consumption.In image classification,the high parallelism of analogue CIM significantly speeds up feature extraction and reduces processing time.With the boosting development of AI applications,the demands for computational accuracy and task complexity are rising continually.However,analogue CIM systems are limited in handling complex regression tasks with needs of precise floating-point(FP)calculations.They are primarily suited for the classification tasks with low data precision and a limited dynamic range[4].
基金supported by the National Key Research&Development Projects of China(Grant No.2022YFA1204100)National Natural Science Foundation of China(Grant No.62488201)+1 种基金CAS Project for Young Scientists in Basic Research(YSBR-003)the Innovation Program of Quantum Science and Technology(2021ZD0302700)。
文摘As a typical in-memory computing hardware design, nonvolatile ternary content-addressable memories(TCAMs) enable the logic operation and data storage for high throughout in parallel big data processing. However,TCAM cells based on conventional silicon-based devices suffer from structural complexity and large footprintlimitations. Here, we demonstrate an ultrafast nonvolatile TCAM cell based on the MoTe2/hBN/multilayergraphene (MLG) van der Waals heterostructure using a top-gated partial floating-gate field-effect transistor(PFGFET) architecture. Based on its ambipolar transport properties, the carrier type in the source/drain andcentral channel regions of the MoTe2 channel can be efficiently tuned by the control gate and top gate, respectively,enabling the reconfigurable operation of the device in either memory or FET mode. When working inthe memory mode, it achieves an ultrafast 60 ns programming/erase speed with a current on-off ratio of ∼105,excellent retention capability, and robust endurance. When serving as a reconfigurable transistor, unipolar p-typeand n-type FETs are obtained by adopting ultrafast 60 ns control-gate voltage pulses with different polarities.The monolithic integration of memory and logic within a single device enables the content-addressable memory(CAM) functionality. Finally, by integrating two PFGFETs in parallel, a TCAM cell with a high current ratioof ∼10^(5) between the match and mismatch states is achieved without requiring additional peripheral circuitry.These results provide a promising route for the design of high-performance TCAM devices for future in-memorycomputing applications.
基金financially supported by the National Research Foundation of Korea(no.NRF-2021R1A2C2010781)grant funded by the Korean Government(Ministry of Science and ICT)Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(no.P0012451,The Competency Development Program for Industry Specialist)Korea Government(MOTIE)(no.P0020966,HRD Program for Industrial Innovation).
文摘Neuromorphic computing devices leveraging HfO_(2) and ZrO_(2) materials have recently garnered significant attention due to their potential for brain-inspired computing systems.In this study,we present a novel trilayer Pt/HfO_(2)/ZrO_(2-x)/HfO_(2)/TiN memristor,engineered with a ZrO_(2-x) oxygen vacancy reservoir(OVR)layer fabricated via radio frequency(RF)sputtering under controlled oxygen ambient.The incorporation of the ZrO_(2-x) OVR layer enables enhanced resistive switching characteristics,including a high ON/OFF ratio(∼8000),excellent uniformity,robust data retention(>105 s),and multilevel storage capabilities.Furthermore,the memristor demonstrates superior synaptic plasticity with linear long-term potentiation(LTP)and depression(LTD),achieving low non-linearity values of 1.36(LTP)and 0.66(LTD),and a recognition accuracy of 95.3%in an MNIST dataset simulation.The unique properties of the ZrO_(2-x) layer,particularly its ability to act as a dynamic oxygen vacancy reservoir,significantly enhance synaptic performance by stabilizing oxygen vacancy migration.These findings establish the OVR-trilayer memristor as a promising candidate for future neuromorphic computing and high-performance memory applications.
基金the National Key R&D Program of China(Grant Nos.2018YFB0407501 and 2016YFA0201800)the National Natural Science Foundation of China(Grant Nos.61804173,61922083,61804167,61904200,and 61821091)the fourth China Association for Science and Technology Youth Talent Support Project(Grant No.2019QNRC001).
文摘The resistive random access memory(RRAM)has stimulated a variety of promising applications including programmable analog circuit,massive data storage,neuromorphic computing,etc.These new emerging applications have huge demands on high integration density and low power consumption.The cross-point configuration or passive array,which offers the smallest footprint of cell size and feasible capability of multi-layer stacking,has received broad attention from the research community.In such array,correct operation of reading and writing on a cell relies on effective elimination of the sneaking current coming from the neighboring cells.This target requires nonlinear I-V characteristics of the memory cell,which can be realized by either adding separate selector or developing implicit build-in nonlinear cells.The performance of a passive array largely depends on the cell nonlinearity,reliability,on/off ratio,line resistance,thermal coupling,etc.This article provides a comprehensive review on the progress achieved concerning 3D RRAM integration.First,the authors start with a brief overview of the associative problems in passive array and the category of 3D architectures.Next,the state of the arts on the development of various selector devices and self-selective cells are presented.Key parameters that influence the device nonlinearity and current density are outlined according to the corresponding working principles.Then,the reliability issues in 3D array are summarized in terms of uniformity,endurance,retention,and disturbance.Subsequently,scaling issue and thermal crosstalk in 3D memory array are thoroughly discussed,and applications of 3D RRAM beyond storage,such as neuromorphic computing and CMOL circuit are discussed later.Summary and outlooks are given in the final.
文摘Many fishes use undulatory fin to propel themselves in the underwater environment. These locomotor mechanisms have a popular interest to many researchers. In the present study, we perform a three-dimensional unsteady computation of an undulatory mechanical fin that is driven by Shape Memory Alloy (SMA). The objective of the computation is to investigate the fluid dynamics of force production associated with the undulatory mechanical fin. An unstructured, grid-based, unsteady Navier-Stokes solver with automatic adaptive remeshing is used to compute the unsteady flow around the fin through five complete cycles. The pressure distribution on fin surface is computed and integrated to provide fin forces which are decomposed into lift and thrust. The velocity field is also computed throughout the swimming cycle. Finally, a comparison is conducted to reveal the dynamics of force generation according to the kinematic parameters of the undulatory fin (amplitude, frequency and wavelength).
文摘The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that developers can make ideal casting designs. To take the prior occupation at commercial simulation market, so many development groups in the world are doing their every effort. They already reported successful stories in manufacturing fields by developing and providing the high performance simulation technologies for multipurpose. But they all run at powerful desk-side computers by well-trained experts mainly, so that it is hard to diffuse the scientific designing concept to newcomers in casting field. To overcome upcoming problems in scientific casting designs, we utilized information technologies and full-matured hardware backbones to spread out the effective and scientific casting design mind, and they all were integrated into Simulation Portal on the web. It professes scientific casting design on the NET including ubiquitous access way represented by "Anyone, Anytime, Anywhere" concept for casting designs.
文摘Traditional digital processing approaches are based on semiconductor transistors, which suffer from high power consumption, aggravating with technology node scaling. To solve definitively this problem, a number of emerging non-volatile nanodevices are under intense investigations. Meanwhile, novel computing circuits are invented to dig the full potential of the nanodevices. The combination of non-volatile nanodevices with suitable computing paradigms have many merits compared with the complementary metal-oxide-semiconductor transistor (CMOS) technology based structures, such as zero standby power, ultra-high density, non-volatility, and acceptable access speed. In this paper, we overview and compare the computing paradigms based on the emerging nanodevices towards ultra-low dissipation.
基金the National Key Research and Development Program of China(No.2017YFC0212100)National High-tech R&D Program of China(No.2015AA015308).
文摘With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-performance computing,require enhanced computing power.To meet this requirement,a non-uniform memory access(NUMA)configuration method is proposed for the cloud computing system according to the affinity,adaptability,and availability of the NUMA architecture processor platform.The proposed method is verified based on the test environment of a domestic central processing unit(CPU).
文摘In 2004, Jeff Hawkins presented a memory-prediction theory of brain function, and later used it to create the Hierar-chical Temporal Memory model. Several of the concepts described in the theory are applied here in a computer vision system for a mobile robot application. The aim was to produce a system enabling a mobile robot to explore its envi-ronment and recognize different types of objects without human supervision. The operator has means to assign names to the identified objects of interest. The system presented here works with time ordered sequences of images. It utilizes a tree structure of connected computational nodes similar to Hierarchical Temporal Memory and memorizes frequent sequences of events. The structure of the proposed system and the algorithms involved are explained. A brief survey of the existing algorithms applicable in the system is provided and future applications are outlined. Problems that can arise when the robot’s velocity changes are listed, and a solution is proposed. The proposed system was tested on a sequence of images recorded by two parallel cameras moving in a real world environment. Results for mono- and ste-reo vision experiments are presented.
基金supported by the National Natural Science Foundation of China(U21A20519)。
文摘As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge devices.The execution speed of the deployed model is a key element to ensure service quality.Considering a highly heterogeneous edge deployment scenario,deep learning compiling is a novel approach that aims to solve this problem.It defines models using certain DSLs and generates efficient code implementations on different hardware devices.However,there are still two aspects that are not yet thoroughly investigated yet.The first is the optimization of memory-intensive operations,and the second problem is the heterogeneity of the deployment target.To that end,in this work,we propose a system solution that optimizes memory-intensive operation,optimizes the subgraph distribution,and enables the compiling and deployment of DNN models on multiple targets.The evaluation results show the performance of our proposed system.
基金supported by the Natural Science Foundation of Shanghai (Grant No.08ZR1408200)the Shanghai Leading Academic Discipline Project (Grant No.J50103)the Open Project Program of the National Laboratory of Pattern Recognition
文摘In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing.