Hardware neural networks controlled rotational actuators and application to an insect type micro robot are reported in this paper. Millimeter size rotational actuators are fabricated by combining MEMS (Micro Electro ...Hardware neural networks controlled rotational actuators and application to an insect type micro robot are reported in this paper. Millimeter size rotational actuators are fabricated by combining MEMS (Micro Electro Mechanical System) technology and shape memory alloy based artificial muscle wires. The actuator is composed of a pair of disk rotators and each rotor is suspended by four artificial muscle wires that are connected to the silicon frame. The rotational motion is generated by flowing the electrical current to each wire successively. Two actuators of different sizes are fabricated. The large actuator shows the displacement of 0.5 mm at the cycle time of 4 s. The small actuator shows 0.3 mm at 2 s. For controlling the actuator, the hardware neural networks are used. The hardware neural networks are composed of electrical circuits imitating cell bodies, excitatory synapses and inhibitory synapses. Four signal ports are extracted from four pairs of excitatory and inhibitory neurons and they are connected to the actuator. The small actuator is applied to the robot and built in the mid body of the robot. The shaft of the actuator is connected to the link mechanisms that transform the rotational motion to the locomotion. The appearance dimensions of the robot are 4.0, 2.7, 2.5 mm width, length and height. The robot performs forward and backward foot step like insects. The speed is 26.4 mm·min^-1 and the stepping width is 0.88 mm. Also, the robot changes the direction by external trigger pulses.展开更多
Robots are widely used,providing significant convenience in daily life and production.With the rapid development of artificial intelligence and neuromorphic computing in recent years,the realization of more intelligen...Robots are widely used,providing significant convenience in daily life and production.With the rapid development of artificial intelligence and neuromorphic computing in recent years,the realization of more intelligent robots through a pro-found intersection of neuroscience and robotics has received much attention.Neuromorphic circuits based on memristors used to construct hardware neural networks have proved to be a promising solution of shattering traditional control limita-tions in the field of robot control,showcasing characteristics that enhance robot intelligence,speed,and energy efficiency.Start-ing with introducing the working mechanism of memristors and peripheral circuit design,this review gives a comprehensive analysis on the biomimetic information processing and biomimetic driving operations achieved through the utilization of neuro-morphic circuits in brain-like control.Four hardware neural network approaches,including digital-analog hybrid circuit design,novel device structure design,multi-regulation mechanism,and crossbar array,are summarized,which can well simulate the motor decision-making mechanism,multi-information integration and parallel control of brain at the hardware level.It will be definitely conductive to promote the application of memristor-based neuromorphic circuits in areas such as intelligent robotics,artificial intelligence,and neural computing.Finally,a conclusion and future prospects are discussed.展开更多
Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems.At the same time,the computational complexity and resource consumption of t...Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems.At the same time,the computational complexity and resource consumption of these networks continue to increase.This poses a significant challenge to the deployment of such networks,especially in real-time applications or on resource-limited devices.Thus,network acceleration has become a hot topic within the deep learning community.As for hardware implementation of deep neural networks,a batch of accelerators based on a field-programmable gate array(FPGA) or an application-specific integrated circuit(ASIC)have been proposed in recent years.In this paper,we provide a comprehensive survey of recent advances in network acceleration,compression,and accelerator design from both algorithm and hardware points of view.Specifically,we provide a thorough analysis of each of the following topics:network pruning,low-rank approximation,network quantization,teacher–student networks,compact network design,and hardware accelerators.Finally,we introduce and discuss a few possible future directions.展开更多
Recently,large Transformer models have achieved impressive results in various natural language processing tasks but require enormous parameters and intensive computations,necessitating deployment on multi-device syste...Recently,large Transformer models have achieved impressive results in various natural language processing tasks but require enormous parameters and intensive computations,necessitating deployment on multi-device systems.Current solutions introduce complicated topologies with dedicated high-bandwidth interconnects to reduce communication overhead.To deal with the complexity problem in system architecture and reduce the overhead of inter-device communications,this paper proposes SALTM,a multi-device system based on a unidirectional ring topology and a 2-D model partitioning method considering quantization and pruning.First,a 1-D model partitioning method is proposed to reduce the amount of communication.Then,the block distributed on each device is further partitioned in the orthogonal direction,introducing a task-level pipeline to overlap communication and computation.To further explore the SALTM’s performance on a real large model like GPT-3,we develop an analytical model to evaluate the performance and communication overhead.Our simulation shows that a BERT model with 110 million parameters,implemented by SALTM on four FPGAs can achieve 9.65×and 1.12×speedups compared to CPU and GPU,respectively.The simulation also shows that the execution time of 4-FPGA SALTM is 1.52×that of an ideal system with infinite inter-device bandwidth.For GPT-3 with 175 billion parameters,our analytical model predicts that SALTM comprising 16 VC1502 FPGAs and 16 A30 GPUs can achieve inference latency of 287 ms and 164 ms,respectively.展开更多
Deep learning typically requires large amounts of labeled data and often struggles with generalization,posing challenges for intelligent systems.In the real world,most electrocardiogram(ECG)signals are unlabeled,which...Deep learning typically requires large amounts of labeled data and often struggles with generalization,posing challenges for intelligent systems.In the real world,most electrocardiogram(ECG)signals are unlabeled,which limits the use of smart devices in ECG-related applications.Unsupervised learning methods,such as contrastive learning,have emerged as a solution to this constraint.However,most contrastive learning encoders rely on deep neural networks with many parameters,making them unsuitable for hardware implementation.This article introduces a hardware-friendly universal ECG encoder with around 1k parameters based on contrastive learning and a fine-tuning framework for ECG-related tasks.We apply the encoder to a dual-task system for ECG-based arrhythmia classification and authentication,achieving 98.2%and 99.7%accuracy on the MIT-BIH dataset,respectively,with FAR of 0.274 and FRR of 0.707 for authentication.We propose a dynamic averaging template concatenation technique to improve neural network generalization significantly.We also develop an energy-efficient hardware architecture optimized for the entire system,successfully implementing it on an FPGA.展开更多
文摘Hardware neural networks controlled rotational actuators and application to an insect type micro robot are reported in this paper. Millimeter size rotational actuators are fabricated by combining MEMS (Micro Electro Mechanical System) technology and shape memory alloy based artificial muscle wires. The actuator is composed of a pair of disk rotators and each rotor is suspended by four artificial muscle wires that are connected to the silicon frame. The rotational motion is generated by flowing the electrical current to each wire successively. Two actuators of different sizes are fabricated. The large actuator shows the displacement of 0.5 mm at the cycle time of 4 s. The small actuator shows 0.3 mm at 2 s. For controlling the actuator, the hardware neural networks are used. The hardware neural networks are composed of electrical circuits imitating cell bodies, excitatory synapses and inhibitory synapses. Four signal ports are extracted from four pairs of excitatory and inhibitory neurons and they are connected to the actuator. The small actuator is applied to the robot and built in the mid body of the robot. The shaft of the actuator is connected to the link mechanisms that transform the rotational motion to the locomotion. The appearance dimensions of the robot are 4.0, 2.7, 2.5 mm width, length and height. The robot performs forward and backward foot step like insects. The speed is 26.4 mm·min^-1 and the stepping width is 0.88 mm. Also, the robot changes the direction by external trigger pulses.
文摘Robots are widely used,providing significant convenience in daily life and production.With the rapid development of artificial intelligence and neuromorphic computing in recent years,the realization of more intelligent robots through a pro-found intersection of neuroscience and robotics has received much attention.Neuromorphic circuits based on memristors used to construct hardware neural networks have proved to be a promising solution of shattering traditional control limita-tions in the field of robot control,showcasing characteristics that enhance robot intelligence,speed,and energy efficiency.Start-ing with introducing the working mechanism of memristors and peripheral circuit design,this review gives a comprehensive analysis on the biomimetic information processing and biomimetic driving operations achieved through the utilization of neuro-morphic circuits in brain-like control.Four hardware neural network approaches,including digital-analog hybrid circuit design,novel device structure design,multi-regulation mechanism,and crossbar array,are summarized,which can well simulate the motor decision-making mechanism,multi-information integration and parallel control of brain at the hardware level.It will be definitely conductive to promote the application of memristor-based neuromorphic circuits in areas such as intelligent robotics,artificial intelligence,and neural computing.Finally,a conclusion and future prospects are discussed.
文摘Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems.At the same time,the computational complexity and resource consumption of these networks continue to increase.This poses a significant challenge to the deployment of such networks,especially in real-time applications or on resource-limited devices.Thus,network acceleration has become a hot topic within the deep learning community.As for hardware implementation of deep neural networks,a batch of accelerators based on a field-programmable gate array(FPGA) or an application-specific integrated circuit(ASIC)have been proposed in recent years.In this paper,we provide a comprehensive survey of recent advances in network acceleration,compression,and accelerator design from both algorithm and hardware points of view.Specifically,we provide a thorough analysis of each of the following topics:network pruning,low-rank approximation,network quantization,teacher–student networks,compact network design,and hardware accelerators.Finally,we introduce and discuss a few possible future directions.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant XDB44000000.
文摘Recently,large Transformer models have achieved impressive results in various natural language processing tasks but require enormous parameters and intensive computations,necessitating deployment on multi-device systems.Current solutions introduce complicated topologies with dedicated high-bandwidth interconnects to reduce communication overhead.To deal with the complexity problem in system architecture and reduce the overhead of inter-device communications,this paper proposes SALTM,a multi-device system based on a unidirectional ring topology and a 2-D model partitioning method considering quantization and pruning.First,a 1-D model partitioning method is proposed to reduce the amount of communication.Then,the block distributed on each device is further partitioned in the orthogonal direction,introducing a task-level pipeline to overlap communication and computation.To further explore the SALTM’s performance on a real large model like GPT-3,we develop an analytical model to evaluate the performance and communication overhead.Our simulation shows that a BERT model with 110 million parameters,implemented by SALTM on four FPGAs can achieve 9.65×and 1.12×speedups compared to CPU and GPU,respectively.The simulation also shows that the execution time of 4-FPGA SALTM is 1.52×that of an ideal system with infinite inter-device bandwidth.For GPT-3 with 175 billion parameters,our analytical model predicts that SALTM comprising 16 VC1502 FPGAs and 16 A30 GPUs can achieve inference latency of 287 ms and 164 ms,respectively.
基金supported by the National Natural Science Foundation of China under Grant 62104025.
文摘Deep learning typically requires large amounts of labeled data and often struggles with generalization,posing challenges for intelligent systems.In the real world,most electrocardiogram(ECG)signals are unlabeled,which limits the use of smart devices in ECG-related applications.Unsupervised learning methods,such as contrastive learning,have emerged as a solution to this constraint.However,most contrastive learning encoders rely on deep neural networks with many parameters,making them unsuitable for hardware implementation.This article introduces a hardware-friendly universal ECG encoder with around 1k parameters based on contrastive learning and a fine-tuning framework for ECG-related tasks.We apply the encoder to a dual-task system for ECG-based arrhythmia classification and authentication,achieving 98.2%and 99.7%accuracy on the MIT-BIH dataset,respectively,with FAR of 0.274 and FRR of 0.707 for authentication.We propose a dynamic averaging template concatenation technique to improve neural network generalization significantly.We also develop an energy-efficient hardware architecture optimized for the entire system,successfully implementing it on an FPGA.