Spiking neural networks(SNN)represent a paradigm shift toward discrete,event-driven neural computation that mirrors biological brain mechanisms.This survey systematically examines current SNN research,focusing on trai...Spiking neural networks(SNN)represent a paradigm shift toward discrete,event-driven neural computation that mirrors biological brain mechanisms.This survey systematically examines current SNN research,focusing on training methodologies,hardware implementations,and practical applications.We analyze four major training paradigms:ANN-to-SNN conversion,direct gradient-based training,spike-timing-dependent plasticity(STDP),and hybrid approaches.Our review encompasses major specialized hardware platforms:Intel Loihi,IBM TrueNorth,SpiNNaker,and BrainScaleS,analyzing their capabilities and constraints.We survey applications spanning computer vision,robotics,edge computing,and brain-computer interfaces,identifying where SNN provide compelling advantages.Our comparative analysis reveals SNN offer significant energy efficiency improvements(1000-10000×reduction)and natural temporal processing,while facing challenges in scalability and training complexity.We identify critical research directions including improved gradient estimation,standardized benchmarking protocols,and hardware-software co-design approaches.This survey provides researchers and practitioners with a comprehensive understanding of current SNN capabilities,limitations,and future prospects.展开更多
On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to f...On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to frequent production changes.Batch normalization(BN)is fundamental to training convolutional neural networks(CNNs),but its implementation in compact accelerator chips remains challenging due to computational complexity,particularly in calculating statistical parameters and gradients across mini-batches.Existing accelerator architectures either compromise the training accuracy of CNNs through approximations or require substantial computational resources,limiting their practical deployment.We present a hardware-optimized BN accelerator that maintains training accuracy while significantly reducing computational overhead through three novel techniques:(1)resourcesharing for efficient resource utilization across forward and backward passes,(2)interleaved buffering for reduced dynamic random-access memory(DRAM)access latencies,and(3)zero-skipping for minimal gradient computation.Implemented on a VCU118 Field Programmable Gate Array(FPGA)on 100 MHz and validated using You Only Look Once version 2-tiny(YOLOv2-tiny)on the PASCALVisualObjectClasses(VOC)dataset,our normalization accelerator achieves a 72%reduction in processing time and 83%lower power consumption compared to a 2.4 GHz Intel Central Processing Unit(CPU)software normalization implementation,while maintaining accuracy(0.51%mean Average Precision(mAP)drop at floating-point 32 bits(FP32),1.35%at brain floating-point 16 bits(bfloat16)).When integrated into a neural processing unit(NPU),the design demonstrates 63%and 97%performance improvements over AMD CPU and Reduced Instruction Set Computing-V(RISC-V)implementations,respectively.These results confirm that our proposed BN hardware design enables efficient,high-accuracy,and power-saving on-device training for modern CNNs.Our results demonstrate that efficient hardware implementation of standard batch normalization is achievable without sacrificing accuracy,enabling practical on-device CNN training with significantly reduced computational and power requirements.展开更多
The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficu...The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficulty effectively processing and fully representing their spatiotemporal complexity patterns.The article also discusses a potential path of AI development in the engineering domain.Based on the existing understanding of the principles of multilevel com-plexity,this article suggests that consistency among the logical structures of datasets,AI models,model-building software,and hardware will be an important AI development direction and is worthy of careful consideration.展开更多
Due to the interdependency of frame synchronization(FS)and channel estimation(CE),joint FS and CE(JFSCE)schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless com...Due to the interdependency of frame synchronization(FS)and channel estimation(CE),joint FS and CE(JFSCE)schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless communication systems.Although traditional JFSCE schemes alleviate the influence between FS and CE,they show deficiencies in dealing with hardware imperfection(HI)and deterministic line-of-sight(LOS)path.To tackle this challenge,we proposed a cascaded ELM-based JFSCE to alleviate the influence of HI in the scenario of the Rician fading channel.Specifically,the conventional JFSCE method is first employed to extract the initial features,and thus forms the non-Neural Network(NN)solutions for FS and CE,respectively.Then,the ELMbased networks,named FS-NET and CE-NET,are cascaded to capture the NN solutions of FS and CE.Simulation and analysis results show that,compared with the conventional JFSCE methods,the proposed cascaded ELM-based JFSCE significantly reduces the error probability of FS and the normalized mean square error(NMSE)of CE,even against the impacts of parameter variations.展开更多
The massive connectivity and limited energy pose significant challenges to deploy the enormous devices in energy-efficient and environmentally friendly in the Internet of Things(IoT).Motivated by these challenges,this...The massive connectivity and limited energy pose significant challenges to deploy the enormous devices in energy-efficient and environmentally friendly in the Internet of Things(IoT).Motivated by these challenges,this paper investigates the energy efficiency(EE)maximization problem for downlink cooperative non-orthogonal multiple access(C-NOMA)systems with hardware impairments(HIs).The base station(BS)communicates with several users via a half-duplex(HD)amplified-and-forward(AF)relay.First,we formulate the EE maximization problem of the system under HIs by jointly optimizing transmit power and power allocated coefficient(PAC)at BS,and transmit power at the relay.The original EE maximization problem is a non-convex problem,which is challenging to give the optimal solution directly.First,we use fractional programming to convert the EE maximization problem as a series of subtraction form subproblems.Then,variable substitution and block coordinate descent(BCD)method are used to handle the sub-problems.Next,a resource allocation algorithm is proposed to maximize the EE of the systems.Finally,simulation results show that the proposed algorithm outperforms the downlink cooperative orthogonal multiple access(C-OMA)scheme.展开更多
Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro...Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.展开更多
The Internet of Vehicles(IoV)will carry a large amount of security and privacy-related data,which makes the secure communication between the IoV terminals increasingly critical.This paper studies the joint beamforming...The Internet of Vehicles(IoV)will carry a large amount of security and privacy-related data,which makes the secure communication between the IoV terminals increasingly critical.This paper studies the joint beamforming for physical-layer security transmission in the coexistence of Vehicle-to-Infrastructure(V2I)and Vehicle-toVehicle(V2V)communication with Reconfigurable Intelligent Surface(RIS)assistance,taking into account hardware impairments.A communication model for physical-layer security transmission is established when the eavesdropping user is present and the base station antenna has hardware impairments assisted by RIS.Based on this model,we propose to maximize the V2I physical-layer security transmission rate.To solve the coupled non-convex optimization problem,an alternating optimization algorithm based on second-order cone programming and semidefinite relaxation is proposed to obtain the optimal V2I base station transmit precoding and RIS reflect phase shift matrix.Finally,simulation results are presented to verify the convergence and superiority of our proposed algorithm while analyzing the impact of system parameters on the V2I physical-layer security transmission rate.The simulation results further demonstrate that the proposed robust beamforming algorithm considering hardware impairments will achieve an average performance improvement of 0.7 dB over a non-robustly designed algorithm.Furthermore,increasing the number of RIS reflective units from 10 to 50 results in an almost 2 dB enhancement in secure transmission rate.展开更多
The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization...The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization implementation of the S-box is proposed based on the composite field inverse operation in this paper. This proposed S-box implementation is modeled using Verilog language and synthesized using Design Complier software under the premise of ensuring the correctness of the simulation result. The synthesis results show that, compared to several current S-box implementation schemes, the proposed implementation of the S-box significantly reduces the area overhead and critical path delay, then gets higher hardware efficiency. This provides strong support for realizing efficient and compact S-box ASIC designs.展开更多
A hardwale demodulation method for 2-D edge detection is proposed. The filtering step and the differential step are implemented by using the hardware circuit. This demodulation circuit simplifies the edgefinder and re...A hardwale demodulation method for 2-D edge detection is proposed. The filtering step and the differential step are implemented by using the hardware circuit. This demodulation circuit simplifies the edgefinder and reduces the measuring cycle. The calibration method of scale setting is also presented,and bymeasuring some calibrated objects,the demodulation errors and the error correction table is obtained.展开更多
The emphasis of constructing and developing the campus information network is how to design and optimize the network hardware system. This paper mainly studies the network system structure design, the server system st...The emphasis of constructing and developing the campus information network is how to design and optimize the network hardware system. This paper mainly studies the network system structure design, the server system structure design and the network export design, and discusses the network hardware system design and optimization for different scale universities according to different practical demand. The objective is that the network hardware system can meet the demand and have been made full use.展开更多
The interpretation of spinal images fixed with metallic hardware forms an increasing bulk of daily practice in a busy imaging department. Radiologists are required to be familiar with the instrumentation and operative...The interpretation of spinal images fixed with metallic hardware forms an increasing bulk of daily practice in a busy imaging department. Radiologists are required to be familiar with the instrumentation and operative options used in spinal fixation and fusion procedures, especially in his or her institute. This is critical in evaluating the position of implants and potential complications associated with the operative approaches and spinal fixation devices used. Thus, the radiologist can play an important role in patient care and outcome. This review outlines the advantages and disadvantages of commonly used imaging methods and reports on the best yield for each modality and how to overcome the problematic issues associated with the presence of metallic hardware during imaging. Baseline radiographs are essential as they are the baseline point for evaluation of future studies should patients develop symptoms suggesting possible complications. They may justify further imaging workup with computed tomography, magnetic resonance and/or nuclear medicine studies as the evaluation of a patient with a spinal implant involves a multi-modality approach. This review describes imaging features of potential complications associated with spinal fusion surgery as well as the instrumentation used. This basic knowledge aims to help radiologists approach everyday practice in clinical imaging.展开更多
The self-healing strategy is a key component in designing the bio-inspired embryonics circuit with the structure of cell arrays. However, the existing self-healing strategies of embryonics circuits mainly focus on per...The self-healing strategy is a key component in designing the bio-inspired embryonics circuit with the structure of cell arrays. However, the existing self-healing strategies of embryonics circuits mainly focus on permanent faults inside the modules of cells such as the function module and the configuration register, while little attention is paid to transient faults. From the point of view of obtaining high efficiency of hardware utilization, it would be a huge waste of hardware resources by permanent elimination when a cell only suffers a transient fault which can be repaired by a configuration mechanism. A new self-healing strategy, the Fault-Cell Reutilization Self-healing Strategy(FCRSS) which presents a method for reusing transient fault cells, is proposed in this paper. The circuit structures of all the modules in the cells are described in detail. In the new strategy, two processes of elimination and reconfiguration are combined. Within the process of fault-cell elimination, cells with transient faults in the embryonics circuit array could be reused simultaneously to replace the functions of the cells on their left side in the same row. Therefore, transient fault-cells in a transparent state can be reconfigured to realize the fault-cell reutilization. Finally,a circuit simulation, resource consumption, a reliability analysis and a detailed normalization analysis are presented. The FCRSS can improve the hardware utilization rate and system reliability at the expense of a small amount of hardware resources and reconfiguration time. Following the conclusion, the method of determining the optimal self-healing strategy is presented according to the environmental conditions.展开更多
Scientific research requires the collection of data in order to study, monitor, analyze, describe, or understand a particular process or event. Data collection efforts are often a compromise: manual measurements can b...Scientific research requires the collection of data in order to study, monitor, analyze, describe, or understand a particular process or event. Data collection efforts are often a compromise: manual measurements can be time-consuming and labor-intensive, resulting in data being collected at a low frequency, while automating the data-collection process can reduce labor requirements and increase the frequency of measurements, but at the cost of added expense of electronic data-collecting instrumentation. Rapid advances in electronic technologies have resulted in a variety of new and inexpensive sensing, monitoring, and control capabilities which offer opportunities for implementation in agricultural and natural-resource research applications. An Open Source Hardware project called Arduino consists of a programmable microcontroller development platform, expansion capability through add-on boards, and a programming development environment for creating custom microcontroller software. All circuit-board and electronic component specifications, as well as the programming software, are open-source and freely available for anyone to use or modify. Inexpensive sensors and the Arduino development platform were used to develop several inexpensive, automated sensing and datalogging systems for use in agricultural and natural-resources related research projects. Systems were developed and implemented to monitor soil-moisture status of field crops for irrigation scheduling and crop-water use studies, to measure daily evaporation-pan water levels for quantifying evaporative demand, and to monitor environmental parameters under forested conditions. These studies demonstrate the usefulness of automated measurements, and offer guidance for other researchers in developing inexpensive sensing and monitoring systems to further their research.展开更多
Hardware/software partitioning is an important step in the design of embedded systems. In this paper, the hardware/software partitioning problem is modeled as a constrained binary integer programming problem, which is...Hardware/software partitioning is an important step in the design of embedded systems. In this paper, the hardware/software partitioning problem is modeled as a constrained binary integer programming problem, which is further converted equivalently to an unconstrained binary integer programming problem by a penalty method. A local search method, HSFM, is developed to obtain a discrete local minimizer of the unconstrained binary integer programming problem. Next, an auxiliary function, which has the same global optimal solutions as the unconstrained binary integer programming problem, is constructed, and its properties are studied. We show that applying HSFM to minimize the auxiliary function can escape from previous local optima by the increase of the parameter value successfully. Finally, a discrete dynamic convexized method is developed to solve the hardware/software partitioning problem. Computational results and comparisons indicate that the proposed algorithm can get high-quality solutions.展开更多
Although there exist a few good schemes to protect the kernel hooks of operating systems, attackers are still able to circumvent existing defense mechanisms with spurious context infonmtion. To address this challenge,...Although there exist a few good schemes to protect the kernel hooks of operating systems, attackers are still able to circumvent existing defense mechanisms with spurious context infonmtion. To address this challenge, this paper proposes a framework, called HooklMA, to detect compromised kernel hooks by using hardware debugging features. The key contribution of the work is that context information is captured from hardware instead of from relatively vulnerable kernel data. Using commodity hardware, a proof-of-concept pro- totype system of HooklMA has been developed. This prototype handles 3 082 dynamic control-flow transfers with related hooks in the kernel space. Experiments show that HooklMA is capable of detecting compomised kernel hooks caused by kernel rootkits. Performance evaluations with UnixBench indicate that runtirre overhead introduced by HooklMA is about 21.5%.展开更多
In the face of harsh natural environment applications such as earth-orbiting and deep space satellites, underwater sea vehicles, strong electromagnetic interference and temperature stress,the circuits faults appear ea...In the face of harsh natural environment applications such as earth-orbiting and deep space satellites, underwater sea vehicles, strong electromagnetic interference and temperature stress,the circuits faults appear easily. Circuit faults will inevitably lead to serious losses of availability or impeded mission success without self-repair over the mission duration. Traditional fault-repair methods based on redundant fault-tolerant technique are straightforward to implement, yet their area, power and weight cost can be excessive. Moreover they utilize all plug-in or component level circuits to realize redundant backup, such that their applicability is limited. Hence, a novel selfrepair technology based on evolvable hardware(EHW) and reparation balance technology(RBT) is proposed. Its cost is low, and fault self-repair of various circuits and devices can be realized through dynamic configuration. Making full use of the fault signals, correcting circuit can be found through EHW technique to realize the balance and compensation of the fault output-signals. In this paper, the self-repair model was analyzed which based on EHW and RBT technique, the specific self-repair strategy was studied, the corresponding self-repair circuit fault system was designed, and the typical faults were simulated and analyzed which combined with the actual electronic devices. Simulation results demonstrated that the proposed fault self-repair strategy was feasible. Compared to traditional techniques, fault self-repair based on EHW consumes fewer hardware resources, and the scope of fault self-repair was expanded significantly.展开更多
Embryonic Array(EA) with different configuration methods will directly affect its reliability and hardware consumption. At present, EA configuration design is lack of quantitative analysis method. In order to reasonab...Embryonic Array(EA) with different configuration methods will directly affect its reliability and hardware consumption. At present, EA configuration design is lack of quantitative analysis method. In order to reasonably optimize EA configuration design, an EA configuration optimization design method is proposed, which is based on the constraints of EA hardware consumption and reliability. Through the analysis of EA working process and composition, quantitative analysis of EA reliability and hardware consumption are completed. Based on the constraints of EA hardware consumption and reliability, the mathematical model of EA configuration optimization design is established, which transfers EA configuration optimization design into an integer nonlinear programming model problem. According to the difference of the fitness value of individual waiting for mutation in population, adaptive mutation operator and crossover operator are selected, and a novel Modified Adaptive Differential Evolution(MADE) algorithm is proposed,which is used to solve EA configuration optimization design problem. Simulation experiments and analysis indicate that the MADE is able to effectively improve the speed, accuracy and stability of algorithm. Moreover, the proposed EA configuration optimization design method can select the most reasonable EA configuration design, and play an important guiding role in EA optimization design.展开更多
This paper presents a simple yet effective decoding for general quasi-cyclic low-density parity-check (QC-LDPC) codes, which not only achieves high hardware utility efficiency (HUE), but also brings about great me...This paper presents a simple yet effective decoding for general quasi-cyclic low-density parity-check (QC-LDPC) codes, which not only achieves high hardware utility efficiency (HUE), but also brings about great memory block reduction without any performance degradation. The main idea is to split the check matrix into several row blocks, then to perform the improved mes- sage passing computations sequentially block by block. As the decoding algorithm improves, the sequential tie between the two-phase computations is broken, so that the two-phase computations can be overlapped which bring in high HUE. Two over- lapping schemes are also presented, each of which suits a different situation. In addition, an efficient memory arrangement scheme is proposed to reduce the great memory block requirement of the LDPC decoder. As an example, for the 0.4 rate LDPC code selected from Chinese Digital TV Terrestrial Broadcasting (DTTB), our decoding saves over 80% memory blocks com- pared with the conventional decoding, and the decoder achieves 0.97 HUE. Finally, the 0.4 rate LDPC decoder is implemented on an FPGA device EP2S30 (speed grade -5). Using 8 row processing units, the decoder can achieve a maximum net throughput of 28.5 Mbps at 20 iterations.展开更多
For polar codes,the performance of successive cancellation list(SCL)decoding is capable of approaching that of maximum likelihood decoding.However,the existing hardware architectures for the SCL decoding suffer from h...For polar codes,the performance of successive cancellation list(SCL)decoding is capable of approaching that of maximum likelihood decoding.However,the existing hardware architectures for the SCL decoding suffer from high hardware complexity due to calculating L decoding paths simultaneously,which are unfriendly to the devices with limited logical resources,such as field programmable gate arrays(FPGAs).In this paper,we propose a list-serial pipelined hardware architecture with low complexity for the SCL decoding,where the serial calculation and the pipelined operation are elegantly combined to strike a balance between the complexity and the latency.Moreover,we employ only one successive cancellation(SC)decoder core without L×L crossbars,and reduce the number of inputs of the metric sorter from 2L to L+2.Finally,the FPGA implementations show that the hardware resource consumption is significantly reduced with negligible decoding performance loss.展开更多
Hardware Trojan(HT) refers to a special module intentionally implanted into a chip or an electronic system. The module can be exploited by the attacker to achieve destructive functions. Unfortunately the HT is difficu...Hardware Trojan(HT) refers to a special module intentionally implanted into a chip or an electronic system. The module can be exploited by the attacker to achieve destructive functions. Unfortunately the HT is difficult to detecte due to its minimal resource occupation. In order to achieve an accurate detection with high efficiency, a HT detection method based on the electromagnetic leakage of the chip is proposed in this paper. At first, the dimensionality reduction and the feature extraction of the electromagnetic leakage signals in each group(template chip, Trojan-free chip and target chip) were realized by principal component analysis(PCA). Then, the Mahalanobis distances between the template group and the other groups were calculated. Finally, the differences between the Mahalanobis distances and the threshold were compared to determine whether the HT had been implanted into the target chip. In addition, the concept of the HT Detection Quality(HTDQ) was proposed to analyze and compare the performance of different detection methods. Our experiment results indicate that the accuracy of this detection method is 91.93%, and the time consumption is 0.042s in average, which shows a high HTDQ compared with three other methods.展开更多
文摘Spiking neural networks(SNN)represent a paradigm shift toward discrete,event-driven neural computation that mirrors biological brain mechanisms.This survey systematically examines current SNN research,focusing on training methodologies,hardware implementations,and practical applications.We analyze four major training paradigms:ANN-to-SNN conversion,direct gradient-based training,spike-timing-dependent plasticity(STDP),and hybrid approaches.Our review encompasses major specialized hardware platforms:Intel Loihi,IBM TrueNorth,SpiNNaker,and BrainScaleS,analyzing their capabilities and constraints.We survey applications spanning computer vision,robotics,edge computing,and brain-computer interfaces,identifying where SNN provide compelling advantages.Our comparative analysis reveals SNN offer significant energy efficiency improvements(1000-10000×reduction)and natural temporal processing,while facing challenges in scalability and training complexity.We identify critical research directions including improved gradient estimation,standardized benchmarking protocols,and hardware-software co-design approaches.This survey provides researchers and practitioners with a comprehensive understanding of current SNN capabilities,limitations,and future prospects.
基金supported by the National Research Foundation of Korea(NRF)grant for RLRC funded by the Korea government(MSIT)(No.2022R1A5A8026986,RLRC)supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2020-0-01304,Development of Self-Learnable Mobile Recursive Neural Network Processor Technology)+3 种基金supported by the MSIT(Ministry of Science and ICT),Republic of Korea,under the Grand Information Technology Research Center support program(IITP-2024-2020-0-01462,Grand-ICT)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)supported by the Korea Technology and Information Promotion Agency for SMEs(TIPA)supported by the Korean government(Ministry of SMEs and Startups)’s Smart Manufacturing Innovation R&D(RS-2024-00434259).
文摘On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to frequent production changes.Batch normalization(BN)is fundamental to training convolutional neural networks(CNNs),but its implementation in compact accelerator chips remains challenging due to computational complexity,particularly in calculating statistical parameters and gradients across mini-batches.Existing accelerator architectures either compromise the training accuracy of CNNs through approximations or require substantial computational resources,limiting their practical deployment.We present a hardware-optimized BN accelerator that maintains training accuracy while significantly reducing computational overhead through three novel techniques:(1)resourcesharing for efficient resource utilization across forward and backward passes,(2)interleaved buffering for reduced dynamic random-access memory(DRAM)access latencies,and(3)zero-skipping for minimal gradient computation.Implemented on a VCU118 Field Programmable Gate Array(FPGA)on 100 MHz and validated using You Only Look Once version 2-tiny(YOLOv2-tiny)on the PASCALVisualObjectClasses(VOC)dataset,our normalization accelerator achieves a 72%reduction in processing time and 83%lower power consumption compared to a 2.4 GHz Intel Central Processing Unit(CPU)software normalization implementation,while maintaining accuracy(0.51%mean Average Precision(mAP)drop at floating-point 32 bits(FP32),1.35%at brain floating-point 16 bits(bfloat16)).When integrated into a neural processing unit(NPU),the design demonstrates 63%and 97%performance improvements over AMD CPU and Reduced Instruction Set Computing-V(RISC-V)implementations,respectively.These results confirm that our proposed BN hardware design enables efficient,high-accuracy,and power-saving on-device training for modern CNNs.Our results demonstrate that efficient hardware implementation of standard batch normalization is achievable without sacrificing accuracy,enabling practical on-device CNN training with significantly reduced computational and power requirements.
文摘The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficulty effectively processing and fully representing their spatiotemporal complexity patterns.The article also discusses a potential path of AI development in the engineering domain.Based on the existing understanding of the principles of multilevel com-plexity,this article suggests that consistency among the logical structures of datasets,AI models,model-building software,and hardware will be an important AI development direction and is worthy of careful consideration.
基金supported in part by the Sichuan Science and Technology Program(Grant No.2023YFG0316)the Industry-University Research Innovation Fund of China University(Grant No.2021ITA10016)+1 种基金the Key Scientific Research Fund of Xihua University(Grant No.Z1320929)the Special Funds of Industry Development of Sichuan Province(Grant No.zyf-2018-056).
文摘Due to the interdependency of frame synchronization(FS)and channel estimation(CE),joint FS and CE(JFSCE)schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless communication systems.Although traditional JFSCE schemes alleviate the influence between FS and CE,they show deficiencies in dealing with hardware imperfection(HI)and deterministic line-of-sight(LOS)path.To tackle this challenge,we proposed a cascaded ELM-based JFSCE to alleviate the influence of HI in the scenario of the Rician fading channel.Specifically,the conventional JFSCE method is first employed to extract the initial features,and thus forms the non-Neural Network(NN)solutions for FS and CE,respectively.Then,the ELMbased networks,named FS-NET and CE-NET,are cascaded to capture the NN solutions of FS and CE.Simulation and analysis results show that,compared with the conventional JFSCE methods,the proposed cascaded ELM-based JFSCE significantly reduces the error probability of FS and the normalized mean square error(NMSE)of CE,even against the impacts of parameter variations.
基金partially supported by the National Natural Science Foundation of China under Grant 61701064Chongqing Natural Science Foundation under Grant cstc2019jcyj-msxmX0264Sichuan Science and Technology Program under Grant 2022YFQ0017。
文摘The massive connectivity and limited energy pose significant challenges to deploy the enormous devices in energy-efficient and environmentally friendly in the Internet of Things(IoT).Motivated by these challenges,this paper investigates the energy efficiency(EE)maximization problem for downlink cooperative non-orthogonal multiple access(C-NOMA)systems with hardware impairments(HIs).The base station(BS)communicates with several users via a half-duplex(HD)amplified-and-forward(AF)relay.First,we formulate the EE maximization problem of the system under HIs by jointly optimizing transmit power and power allocated coefficient(PAC)at BS,and transmit power at the relay.The original EE maximization problem is a non-convex problem,which is challenging to give the optimal solution directly.First,we use fractional programming to convert the EE maximization problem as a series of subtraction form subproblems.Then,variable substitution and block coordinate descent(BCD)method are used to handle the sub-problems.Next,a resource allocation algorithm is proposed to maximize the EE of the systems.Finally,simulation results show that the proposed algorithm outperforms the downlink cooperative orthogonal multiple access(C-OMA)scheme.
基金This work was supported by Open Fund Project of State Key Laboratory of Intelligent Vehicle Safety Technology by Grant with No.IVSTSKL-202311Key Projects of Science and Technology Research Programme of Chongqing Municipal Education Commission by Grant with No.KJZD-K202301505+1 种基金Cooperation Project between Chongqing Municipal Undergraduate Universities and Institutes Affiliated to the Chinese Academy of Sciences in 2021 by Grant with No.HZ2021015Chongqing Graduate Student Research Innovation Program by Grant with No.CYS240801.
文摘Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.
基金the Key Research and Development Plan of Jiangsu Province,grant number BE2020084-2the National Key Research and Development Program of China,grant number 2020YFB1600104.
文摘The Internet of Vehicles(IoV)will carry a large amount of security and privacy-related data,which makes the secure communication between the IoV terminals increasingly critical.This paper studies the joint beamforming for physical-layer security transmission in the coexistence of Vehicle-to-Infrastructure(V2I)and Vehicle-toVehicle(V2V)communication with Reconfigurable Intelligent Surface(RIS)assistance,taking into account hardware impairments.A communication model for physical-layer security transmission is established when the eavesdropping user is present and the base station antenna has hardware impairments assisted by RIS.Based on this model,we propose to maximize the V2I physical-layer security transmission rate.To solve the coupled non-convex optimization problem,an alternating optimization algorithm based on second-order cone programming and semidefinite relaxation is proposed to obtain the optimal V2I base station transmit precoding and RIS reflect phase shift matrix.Finally,simulation results are presented to verify the convergence and superiority of our proposed algorithm while analyzing the impact of system parameters on the V2I physical-layer security transmission rate.The simulation results further demonstrate that the proposed robust beamforming algorithm considering hardware impairments will achieve an average performance improvement of 0.7 dB over a non-robustly designed algorithm.Furthermore,increasing the number of RIS reflective units from 10 to 50 results in an almost 2 dB enhancement in secure transmission rate.
文摘The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization implementation of the S-box is proposed based on the composite field inverse operation in this paper. This proposed S-box implementation is modeled using Verilog language and synthesized using Design Complier software under the premise of ensuring the correctness of the simulation result. The synthesis results show that, compared to several current S-box implementation schemes, the proposed implementation of the S-box significantly reduces the area overhead and critical path delay, then gets higher hardware efficiency. This provides strong support for realizing efficient and compact S-box ASIC designs.
文摘A hardwale demodulation method for 2-D edge detection is proposed. The filtering step and the differential step are implemented by using the hardware circuit. This demodulation circuit simplifies the edgefinder and reduces the measuring cycle. The calibration method of scale setting is also presented,and bymeasuring some calibrated objects,the demodulation errors and the error correction table is obtained.
文摘The emphasis of constructing and developing the campus information network is how to design and optimize the network hardware system. This paper mainly studies the network system structure design, the server system structure design and the network export design, and discusses the network hardware system design and optimization for different scale universities according to different practical demand. The objective is that the network hardware system can meet the demand and have been made full use.
文摘The interpretation of spinal images fixed with metallic hardware forms an increasing bulk of daily practice in a busy imaging department. Radiologists are required to be familiar with the instrumentation and operative options used in spinal fixation and fusion procedures, especially in his or her institute. This is critical in evaluating the position of implants and potential complications associated with the operative approaches and spinal fixation devices used. Thus, the radiologist can play an important role in patient care and outcome. This review outlines the advantages and disadvantages of commonly used imaging methods and reports on the best yield for each modality and how to overcome the problematic issues associated with the presence of metallic hardware during imaging. Baseline radiographs are essential as they are the baseline point for evaluation of future studies should patients develop symptoms suggesting possible complications. They may justify further imaging workup with computed tomography, magnetic resonance and/or nuclear medicine studies as the evaluation of a patient with a spinal implant involves a multi-modality approach. This review describes imaging features of potential complications associated with spinal fusion surgery as well as the instrumentation used. This basic knowledge aims to help radiologists approach everyday practice in clinical imaging.
基金co-supported by the National Natural Science Foundation of China(Nos.61202001,61402226)the Fundamental Research Funds for the Central Universities of NUAA of China(Nos.NS2018026,NS2012024)
文摘The self-healing strategy is a key component in designing the bio-inspired embryonics circuit with the structure of cell arrays. However, the existing self-healing strategies of embryonics circuits mainly focus on permanent faults inside the modules of cells such as the function module and the configuration register, while little attention is paid to transient faults. From the point of view of obtaining high efficiency of hardware utilization, it would be a huge waste of hardware resources by permanent elimination when a cell only suffers a transient fault which can be repaired by a configuration mechanism. A new self-healing strategy, the Fault-Cell Reutilization Self-healing Strategy(FCRSS) which presents a method for reusing transient fault cells, is proposed in this paper. The circuit structures of all the modules in the cells are described in detail. In the new strategy, two processes of elimination and reconfiguration are combined. Within the process of fault-cell elimination, cells with transient faults in the embryonics circuit array could be reused simultaneously to replace the functions of the cells on their left side in the same row. Therefore, transient fault-cells in a transparent state can be reconfigured to realize the fault-cell reutilization. Finally,a circuit simulation, resource consumption, a reliability analysis and a detailed normalization analysis are presented. The FCRSS can improve the hardware utilization rate and system reliability at the expense of a small amount of hardware resources and reconfiguration time. Following the conclusion, the method of determining the optimal self-healing strategy is presented according to the environmental conditions.
文摘Scientific research requires the collection of data in order to study, monitor, analyze, describe, or understand a particular process or event. Data collection efforts are often a compromise: manual measurements can be time-consuming and labor-intensive, resulting in data being collected at a low frequency, while automating the data-collection process can reduce labor requirements and increase the frequency of measurements, but at the cost of added expense of electronic data-collecting instrumentation. Rapid advances in electronic technologies have resulted in a variety of new and inexpensive sensing, monitoring, and control capabilities which offer opportunities for implementation in agricultural and natural-resource research applications. An Open Source Hardware project called Arduino consists of a programmable microcontroller development platform, expansion capability through add-on boards, and a programming development environment for creating custom microcontroller software. All circuit-board and electronic component specifications, as well as the programming software, are open-source and freely available for anyone to use or modify. Inexpensive sensors and the Arduino development platform were used to develop several inexpensive, automated sensing and datalogging systems for use in agricultural and natural-resources related research projects. Systems were developed and implemented to monitor soil-moisture status of field crops for irrigation scheduling and crop-water use studies, to measure daily evaporation-pan water levels for quantifying evaporative demand, and to monitor environmental parameters under forested conditions. These studies demonstrate the usefulness of automated measurements, and offer guidance for other researchers in developing inexpensive sensing and monitoring systems to further their research.
基金Supported by the National Natural Science Foundation of China(11301255)the Fund by Collaborative Innovation Center of IoT Industrialization and Intelligent Production,Minjiang University(IIC1703)+1 种基金Foundation of Minjiang University(MYK17032)the Program for New Century Excellent Talents in Fujian Province University
文摘Hardware/software partitioning is an important step in the design of embedded systems. In this paper, the hardware/software partitioning problem is modeled as a constrained binary integer programming problem, which is further converted equivalently to an unconstrained binary integer programming problem by a penalty method. A local search method, HSFM, is developed to obtain a discrete local minimizer of the unconstrained binary integer programming problem. Next, an auxiliary function, which has the same global optimal solutions as the unconstrained binary integer programming problem, is constructed, and its properties are studied. We show that applying HSFM to minimize the auxiliary function can escape from previous local optima by the increase of the parameter value successfully. Finally, a discrete dynamic convexized method is developed to solve the hardware/software partitioning problem. Computational results and comparisons indicate that the proposed algorithm can get high-quality solutions.
基金The authors would like to thank the anonymous reviewers for their insightful corrnlents that have helped improve the presentation of this paper. The work was supported partially by the National Natural Science Foundation of China under Grants No. 61070192, No.91018008, No. 61170240 the National High-Tech Research Development Program of China under Grant No. 2007AA01ZA14 the Natural Science Foundation of Beijing un- der Grant No. 4122041.
文摘Although there exist a few good schemes to protect the kernel hooks of operating systems, attackers are still able to circumvent existing defense mechanisms with spurious context infonmtion. To address this challenge, this paper proposes a framework, called HooklMA, to detect compromised kernel hooks by using hardware debugging features. The key contribution of the work is that context information is captured from hardware instead of from relatively vulnerable kernel data. Using commodity hardware, a proof-of-concept pro- totype system of HooklMA has been developed. This prototype handles 3 082 dynamic control-flow transfers with related hooks in the kernel space. Experiments show that HooklMA is capable of detecting compomised kernel hooks caused by kernel rootkits. Performance evaluations with UnixBench indicate that runtirre overhead introduced by HooklMA is about 21.5%.
基金supported by the National Natural Science Foundation of China (Nos. 61271153, 61372039)
文摘In the face of harsh natural environment applications such as earth-orbiting and deep space satellites, underwater sea vehicles, strong electromagnetic interference and temperature stress,the circuits faults appear easily. Circuit faults will inevitably lead to serious losses of availability or impeded mission success without self-repair over the mission duration. Traditional fault-repair methods based on redundant fault-tolerant technique are straightforward to implement, yet their area, power and weight cost can be excessive. Moreover they utilize all plug-in or component level circuits to realize redundant backup, such that their applicability is limited. Hence, a novel selfrepair technology based on evolvable hardware(EHW) and reparation balance technology(RBT) is proposed. Its cost is low, and fault self-repair of various circuits and devices can be realized through dynamic configuration. Making full use of the fault signals, correcting circuit can be found through EHW technique to realize the balance and compensation of the fault output-signals. In this paper, the self-repair model was analyzed which based on EHW and RBT technique, the specific self-repair strategy was studied, the corresponding self-repair circuit fault system was designed, and the typical faults were simulated and analyzed which combined with the actual electronic devices. Simulation results demonstrated that the proposed fault self-repair strategy was feasible. Compared to traditional techniques, fault self-repair based on EHW consumes fewer hardware resources, and the scope of fault self-repair was expanded significantly.
基金supported by the National Natural Science Foundation of China(Nos.61372039 and 61601495)
文摘Embryonic Array(EA) with different configuration methods will directly affect its reliability and hardware consumption. At present, EA configuration design is lack of quantitative analysis method. In order to reasonably optimize EA configuration design, an EA configuration optimization design method is proposed, which is based on the constraints of EA hardware consumption and reliability. Through the analysis of EA working process and composition, quantitative analysis of EA reliability and hardware consumption are completed. Based on the constraints of EA hardware consumption and reliability, the mathematical model of EA configuration optimization design is established, which transfers EA configuration optimization design into an integer nonlinear programming model problem. According to the difference of the fitness value of individual waiting for mutation in population, adaptive mutation operator and crossover operator are selected, and a novel Modified Adaptive Differential Evolution(MADE) algorithm is proposed,which is used to solve EA configuration optimization design problem. Simulation experiments and analysis indicate that the MADE is able to effectively improve the speed, accuracy and stability of algorithm. Moreover, the proposed EA configuration optimization design method can select the most reasonable EA configuration design, and play an important guiding role in EA optimization design.
基金Science and Technology on Avionics Integration Laboratory and Aeronautical Science Foundation of China (20115551022)
文摘This paper presents a simple yet effective decoding for general quasi-cyclic low-density parity-check (QC-LDPC) codes, which not only achieves high hardware utility efficiency (HUE), but also brings about great memory block reduction without any performance degradation. The main idea is to split the check matrix into several row blocks, then to perform the improved mes- sage passing computations sequentially block by block. As the decoding algorithm improves, the sequential tie between the two-phase computations is broken, so that the two-phase computations can be overlapped which bring in high HUE. Two over- lapping schemes are also presented, each of which suits a different situation. In addition, an efficient memory arrangement scheme is proposed to reduce the great memory block requirement of the LDPC decoder. As an example, for the 0.4 rate LDPC code selected from Chinese Digital TV Terrestrial Broadcasting (DTTB), our decoding saves over 80% memory blocks com- pared with the conventional decoding, and the decoder achieves 0.97 HUE. Finally, the 0.4 rate LDPC decoder is implemented on an FPGA device EP2S30 (speed grade -5). Using 8 row processing units, the decoder can achieve a maximum net throughput of 28.5 Mbps at 20 iterations.
基金supported in part by the National Key R&D Program of China(No.2019YFB1803400)。
文摘For polar codes,the performance of successive cancellation list(SCL)decoding is capable of approaching that of maximum likelihood decoding.However,the existing hardware architectures for the SCL decoding suffer from high hardware complexity due to calculating L decoding paths simultaneously,which are unfriendly to the devices with limited logical resources,such as field programmable gate arrays(FPGAs).In this paper,we propose a list-serial pipelined hardware architecture with low complexity for the SCL decoding,where the serial calculation and the pipelined operation are elegantly combined to strike a balance between the complexity and the latency.Moreover,we employ only one successive cancellation(SC)decoder core without L×L crossbars,and reduce the number of inputs of the metric sorter from 2L to L+2.Finally,the FPGA implementations show that the hardware resource consumption is significantly reduced with negligible decoding performance loss.
基金supported by the Special Funds for Basic Scientific Research Business Expenses of Central Universities No. 2014GCYY0the Beijing Natural Science Foundation No. 4163076the Fundamental Research Funds for the Central Universities No. 328201801
文摘Hardware Trojan(HT) refers to a special module intentionally implanted into a chip or an electronic system. The module can be exploited by the attacker to achieve destructive functions. Unfortunately the HT is difficult to detecte due to its minimal resource occupation. In order to achieve an accurate detection with high efficiency, a HT detection method based on the electromagnetic leakage of the chip is proposed in this paper. At first, the dimensionality reduction and the feature extraction of the electromagnetic leakage signals in each group(template chip, Trojan-free chip and target chip) were realized by principal component analysis(PCA). Then, the Mahalanobis distances between the template group and the other groups were calculated. Finally, the differences between the Mahalanobis distances and the threshold were compared to determine whether the HT had been implanted into the target chip. In addition, the concept of the HT Detection Quality(HTDQ) was proposed to analyze and compare the performance of different detection methods. Our experiment results indicate that the accuracy of this detection method is 91.93%, and the time consumption is 0.042s in average, which shows a high HTDQ compared with three other methods.