On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to f...On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to frequent production changes.Batch normalization(BN)is fundamental to training convolutional neural networks(CNNs),but its implementation in compact accelerator chips remains challenging due to computational complexity,particularly in calculating statistical parameters and gradients across mini-batches.Existing accelerator architectures either compromise the training accuracy of CNNs through approximations or require substantial computational resources,limiting their practical deployment.We present a hardware-optimized BN accelerator that maintains training accuracy while significantly reducing computational overhead through three novel techniques:(1)resourcesharing for efficient resource utilization across forward and backward passes,(2)interleaved buffering for reduced dynamic random-access memory(DRAM)access latencies,and(3)zero-skipping for minimal gradient computation.Implemented on a VCU118 Field Programmable Gate Array(FPGA)on 100 MHz and validated using You Only Look Once version 2-tiny(YOLOv2-tiny)on the PASCALVisualObjectClasses(VOC)dataset,our normalization accelerator achieves a 72%reduction in processing time and 83%lower power consumption compared to a 2.4 GHz Intel Central Processing Unit(CPU)software normalization implementation,while maintaining accuracy(0.51%mean Average Precision(mAP)drop at floating-point 32 bits(FP32),1.35%at brain floating-point 16 bits(bfloat16)).When integrated into a neural processing unit(NPU),the design demonstrates 63%and 97%performance improvements over AMD CPU and Reduced Instruction Set Computing-V(RISC-V)implementations,respectively.These results confirm that our proposed BN hardware design enables efficient,high-accuracy,and power-saving on-device training for modern CNNs.Our results demonstrate that efficient hardware implementation of standard batch normalization is achievable without sacrificing accuracy,enabling practical on-device CNN training with significantly reduced computational and power requirements.展开更多
Aim to countermeasure the presentation attack for iris recognition system,an iris liveness detection scheme based on batch normalized convolutional neural network(BNCNN)is proposed to improve the reliability of the ir...Aim to countermeasure the presentation attack for iris recognition system,an iris liveness detection scheme based on batch normalized convolutional neural network(BNCNN)is proposed to improve the reliability of the iris authentication system.The BNCNN architecture with eighteen layers is constructed to detect the genuine iris and fake iris,including convolutional layer,batch-normalized(BN)layer,Relu layer,pooling layer and full connected layer.The iris image is first preprocessed by iris segmentation and is normalized to 256×256 pixels,and then the iris features are extracted by BNCNN.With these features,the genuine iris and fake iris are determined by the decision-making layer.Batch normalization technique is used in BNCNN to avoid the problem of over fitting and gradient disappearing during training.Extensive experiments are conducted on three classical databases:the CASIA Iris Lamp database,the CASIA Iris Syn database and Ndcontact database.The results show that the proposed method can effectively extract micro texture features of the iris,and achieve higher detection accuracy compared with some typical iris liveness detection methods.展开更多
为解决传统神经网络在CIFAR-10(Canadian Institute For Advanced Research)数据集上进行图像分类识别时,存在的模型准确率较低和训练过程易发生过拟合现象等问题,提出了一种将卷积神经网络和批归一化相结合的新神经网络结构构建方法。...为解决传统神经网络在CIFAR-10(Canadian Institute For Advanced Research)数据集上进行图像分类识别时,存在的模型准确率较低和训练过程易发生过拟合现象等问题,提出了一种将卷积神经网络和批归一化相结合的新神经网络结构构建方法。该方法首先对数据集进行数据增强和边界填充处理,其次对典型的CNN(Convolutional Neural Networks)网络结构进行改进,移除了卷积层组中的池化层,仅保留了卷积层和BN(Batch Normalization)层,并适量增加卷积层组。为了验证模型的有效性和准确性,设计了6组不同的神经网络结构对模型进行训练。实验结果表明,在相同训练周期数下,推荐使用的model-6模型表现最佳,测试准确率高达90.17%,突破了长期以来经典CNN在CIFAR-10数据集上难于达到90%准确率的瓶颈,为图像分类识别提供了新的解决方案和模型参考。展开更多
基金supported by the National Research Foundation of Korea(NRF)grant for RLRC funded by the Korea government(MSIT)(No.2022R1A5A8026986,RLRC)supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2020-0-01304,Development of Self-Learnable Mobile Recursive Neural Network Processor Technology)+3 种基金supported by the MSIT(Ministry of Science and ICT),Republic of Korea,under the Grand Information Technology Research Center support program(IITP-2024-2020-0-01462,Grand-ICT)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)supported by the Korea Technology and Information Promotion Agency for SMEs(TIPA)supported by the Korean government(Ministry of SMEs and Startups)’s Smart Manufacturing Innovation R&D(RS-2024-00434259).
文摘On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to frequent production changes.Batch normalization(BN)is fundamental to training convolutional neural networks(CNNs),but its implementation in compact accelerator chips remains challenging due to computational complexity,particularly in calculating statistical parameters and gradients across mini-batches.Existing accelerator architectures either compromise the training accuracy of CNNs through approximations or require substantial computational resources,limiting their practical deployment.We present a hardware-optimized BN accelerator that maintains training accuracy while significantly reducing computational overhead through three novel techniques:(1)resourcesharing for efficient resource utilization across forward and backward passes,(2)interleaved buffering for reduced dynamic random-access memory(DRAM)access latencies,and(3)zero-skipping for minimal gradient computation.Implemented on a VCU118 Field Programmable Gate Array(FPGA)on 100 MHz and validated using You Only Look Once version 2-tiny(YOLOv2-tiny)on the PASCALVisualObjectClasses(VOC)dataset,our normalization accelerator achieves a 72%reduction in processing time and 83%lower power consumption compared to a 2.4 GHz Intel Central Processing Unit(CPU)software normalization implementation,while maintaining accuracy(0.51%mean Average Precision(mAP)drop at floating-point 32 bits(FP32),1.35%at brain floating-point 16 bits(bfloat16)).When integrated into a neural processing unit(NPU),the design demonstrates 63%and 97%performance improvements over AMD CPU and Reduced Instruction Set Computing-V(RISC-V)implementations,respectively.These results confirm that our proposed BN hardware design enables efficient,high-accuracy,and power-saving on-device training for modern CNNs.Our results demonstrate that efficient hardware implementation of standard batch normalization is achievable without sacrificing accuracy,enabling practical on-device CNN training with significantly reduced computational and power requirements.
基金This work was supported in part by project supported by National Natural Science Foundation of China(Grant No.61572182,No.61370225)project supported by Hunan Provincial Natural Science Foundation of China(Grant No.15JJ2007).
文摘Aim to countermeasure the presentation attack for iris recognition system,an iris liveness detection scheme based on batch normalized convolutional neural network(BNCNN)is proposed to improve the reliability of the iris authentication system.The BNCNN architecture with eighteen layers is constructed to detect the genuine iris and fake iris,including convolutional layer,batch-normalized(BN)layer,Relu layer,pooling layer and full connected layer.The iris image is first preprocessed by iris segmentation and is normalized to 256×256 pixels,and then the iris features are extracted by BNCNN.With these features,the genuine iris and fake iris are determined by the decision-making layer.Batch normalization technique is used in BNCNN to avoid the problem of over fitting and gradient disappearing during training.Extensive experiments are conducted on three classical databases:the CASIA Iris Lamp database,the CASIA Iris Syn database and Ndcontact database.The results show that the proposed method can effectively extract micro texture features of the iris,and achieve higher detection accuracy compared with some typical iris liveness detection methods.
文摘为解决传统神经网络在CIFAR-10(Canadian Institute For Advanced Research)数据集上进行图像分类识别时,存在的模型准确率较低和训练过程易发生过拟合现象等问题,提出了一种将卷积神经网络和批归一化相结合的新神经网络结构构建方法。该方法首先对数据集进行数据增强和边界填充处理,其次对典型的CNN(Convolutional Neural Networks)网络结构进行改进,移除了卷积层组中的池化层,仅保留了卷积层和BN(Batch Normalization)层,并适量增加卷积层组。为了验证模型的有效性和准确性,设计了6组不同的神经网络结构对模型进行训练。实验结果表明,在相同训练周期数下,推荐使用的model-6模型表现最佳,测试准确率高达90.17%,突破了长期以来经典CNN在CIFAR-10数据集上难于达到90%准确率的瓶颈,为图像分类识别提供了新的解决方案和模型参考。