On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to f...On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to frequent production changes.Batch normalization(BN)is fundamental to training convolutional neural networks(CNNs),but its implementation in compact accelerator chips remains challenging due to computational complexity,particularly in calculating statistical parameters and gradients across mini-batches.Existing accelerator architectures either compromise the training accuracy of CNNs through approximations or require substantial computational resources,limiting their practical deployment.We present a hardware-optimized BN accelerator that maintains training accuracy while significantly reducing computational overhead through three novel techniques:(1)resourcesharing for efficient resource utilization across forward and backward passes,(2)interleaved buffering for reduced dynamic random-access memory(DRAM)access latencies,and(3)zero-skipping for minimal gradient computation.Implemented on a VCU118 Field Programmable Gate Array(FPGA)on 100 MHz and validated using You Only Look Once version 2-tiny(YOLOv2-tiny)on the PASCALVisualObjectClasses(VOC)dataset,our normalization accelerator achieves a 72%reduction in processing time and 83%lower power consumption compared to a 2.4 GHz Intel Central Processing Unit(CPU)software normalization implementation,while maintaining accuracy(0.51%mean Average Precision(mAP)drop at floating-point 32 bits(FP32),1.35%at brain floating-point 16 bits(bfloat16)).When integrated into a neural processing unit(NPU),the design demonstrates 63%and 97%performance improvements over AMD CPU and Reduced Instruction Set Computing-V(RISC-V)implementations,respectively.These results confirm that our proposed BN hardware design enables efficient,high-accuracy,and power-saving on-device training for modern CNNs.Our results demonstrate that efficient hardware implementation of standard batch normalization is achievable without sacrificing accuracy,enabling practical on-device CNN training with significantly reduced computational and power requirements.展开更多
Wastewater treatment is a complicated dynamic process affected by microbial, chemical and physical factors. Faults are inevitable during the operation of modified sequencing batch reactors(MSBRs) because of the uncert...Wastewater treatment is a complicated dynamic process affected by microbial, chemical and physical factors. Faults are inevitable during the operation of modified sequencing batch reactors(MSBRs) because of the uncertainty of various factors. Abnormal MSBR results require fault diagnosis to determine the cause of failure and implement appropriate measures to adjust system operations. Bayesian network(BN) is a powerful knowledge representation tool that deals explicitly with uncertainty. A BN-based approach to diagnosing wastewater treatment systems based on MSBR is developed in this study. The network is constructed using the knowledge derived from literature and elicited from experts, and it is parametrized using independent data from a pilot test.A one-year pilot study is conducted to verify the diagnostic analysis. The proposed model is reasonable, and the diagnosis results are accurate. This approach can be applied with minimal modifications to other types of wastewater treatment plants.展开更多
Batch effects are technical sources of variation and can confound analysis.While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm(BECA),we hold the viewpoi...Batch effects are technical sources of variation and can confound analysis.While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm(BECA),we hold the viewpoint that the notion of best is context-dependent.Moreover,alternative questions beyond the simplistic notion of "best" are also interesting:are BECAs robust against various degrees of confounding and if so,what is the limit?Using two different methods for simulating class(phenotype) and batch effects and taking various representative datasets across both genomics(RNA-Seq) and proteomics platforms,we demonstrate that under situations where sample classes and batch factors are moderately confounded,most BECAs are remarkably robust and only weakly affected by upstream normalization procedures.This observation is consistently supported across the multitude of test datasets.BECAs do have limits:When sample classes and batch factors are strongly confounded,BECA performance declines,with variable performance in precision,recall and also batch correction.We also report that while conventional normalization methods have minimal impact on batch effect correction,they do not affect downstream statistical feature selection,and in strongly confounded scenarios,may even outperform BECAs.In other words,removing batch effects is no guarantee of optimal functional analysis.Overall,this study suggests that simplistic performance ranking exercises are quite trivial,and all BECAs are compromises in some context or another.展开更多
Aim to countermeasure the presentation attack for iris recognition system,an iris liveness detection scheme based on batch normalized convolutional neural network(BNCNN)is proposed to improve the reliability of the ir...Aim to countermeasure the presentation attack for iris recognition system,an iris liveness detection scheme based on batch normalized convolutional neural network(BNCNN)is proposed to improve the reliability of the iris authentication system.The BNCNN architecture with eighteen layers is constructed to detect the genuine iris and fake iris,including convolutional layer,batch-normalized(BN)layer,Relu layer,pooling layer and full connected layer.The iris image is first preprocessed by iris segmentation and is normalized to 256×256 pixels,and then the iris features are extracted by BNCNN.With these features,the genuine iris and fake iris are determined by the decision-making layer.Batch normalization technique is used in BNCNN to avoid the problem of over fitting and gradient disappearing during training.Extensive experiments are conducted on three classical databases:the CASIA Iris Lamp database,the CASIA Iris Syn database and Ndcontact database.The results show that the proposed method can effectively extract micro texture features of the iris,and achieve higher detection accuracy compared with some typical iris liveness detection methods.展开更多
In order to effectively solve the problems of low accuracy,large amount of computation and complex logic of deep learning algorithms in behavior recognition,a kind of behavior recognition based on the fusion of 3 dime...In order to effectively solve the problems of low accuracy,large amount of computation and complex logic of deep learning algorithms in behavior recognition,a kind of behavior recognition based on the fusion of 3 dimensional batch normalization visual geometry group(3D-BN-VGG)and long short-term memory(LSTM)network is designed.In this network,3D convolutional layer is used to extract the spatial domain features and time domain features of video sequence at the same time,multiple small convolution kernels are stacked to replace large convolution kernels,thus the depth of neural network is deepened and the number of network parameters is reduced.In addition,the latest batch normalization algorithm is added to the 3-dimensional convolutional network to improve the training speed.Then the output of the full connection layer is sent to LSTM network as the feature vectors to extract the sequence information.This method,which directly uses the output of the whole base level without passing through the full connection layer,reduces the parameters of the whole fusion network to 15324485,nearly twice as much as those of 3D-BN-VGG.Finally,it reveals that the proposed network achieves 96.5%and 74.9%accuracy in the UCF-101 and HMDB-51 respectively,and the algorithm has a calculation speed of 1066 fps and an acceleration ratio of 1,which has a significant predominance in velocity.展开更多
基金supported by the National Research Foundation of Korea(NRF)grant for RLRC funded by the Korea government(MSIT)(No.2022R1A5A8026986,RLRC)supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2020-0-01304,Development of Self-Learnable Mobile Recursive Neural Network Processor Technology)+3 种基金supported by the MSIT(Ministry of Science and ICT),Republic of Korea,under the Grand Information Technology Research Center support program(IITP-2024-2020-0-01462,Grand-ICT)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)supported by the Korea Technology and Information Promotion Agency for SMEs(TIPA)supported by the Korean government(Ministry of SMEs and Startups)’s Smart Manufacturing Innovation R&D(RS-2024-00434259).
文摘On-device Artificial Intelligence(AI)accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field,where frequent retraining is crucial due to frequent production changes.Batch normalization(BN)is fundamental to training convolutional neural networks(CNNs),but its implementation in compact accelerator chips remains challenging due to computational complexity,particularly in calculating statistical parameters and gradients across mini-batches.Existing accelerator architectures either compromise the training accuracy of CNNs through approximations or require substantial computational resources,limiting their practical deployment.We present a hardware-optimized BN accelerator that maintains training accuracy while significantly reducing computational overhead through three novel techniques:(1)resourcesharing for efficient resource utilization across forward and backward passes,(2)interleaved buffering for reduced dynamic random-access memory(DRAM)access latencies,and(3)zero-skipping for minimal gradient computation.Implemented on a VCU118 Field Programmable Gate Array(FPGA)on 100 MHz and validated using You Only Look Once version 2-tiny(YOLOv2-tiny)on the PASCALVisualObjectClasses(VOC)dataset,our normalization accelerator achieves a 72%reduction in processing time and 83%lower power consumption compared to a 2.4 GHz Intel Central Processing Unit(CPU)software normalization implementation,while maintaining accuracy(0.51%mean Average Precision(mAP)drop at floating-point 32 bits(FP32),1.35%at brain floating-point 16 bits(bfloat16)).When integrated into a neural processing unit(NPU),the design demonstrates 63%and 97%performance improvements over AMD CPU and Reduced Instruction Set Computing-V(RISC-V)implementations,respectively.These results confirm that our proposed BN hardware design enables efficient,high-accuracy,and power-saving on-device training for modern CNNs.Our results demonstrate that efficient hardware implementation of standard batch normalization is achievable without sacrificing accuracy,enabling practical on-device CNN training with significantly reduced computational and power requirements.
基金the Foundation of State Key Laboratory of Ocean Engineering of Shanghai Jiao Tong University(No.GKZD010071)
文摘Wastewater treatment is a complicated dynamic process affected by microbial, chemical and physical factors. Faults are inevitable during the operation of modified sequencing batch reactors(MSBRs) because of the uncertainty of various factors. Abnormal MSBR results require fault diagnosis to determine the cause of failure and implement appropriate measures to adjust system operations. Bayesian network(BN) is a powerful knowledge representation tool that deals explicitly with uncertainty. A BN-based approach to diagnosing wastewater treatment systems based on MSBR is developed in this study. The network is constructed using the knowledge derived from literature and elicited from experts, and it is parametrized using independent data from a pilot test.A one-year pilot study is conducted to verify the diagnostic analysis. The proposed model is reasonable, and the diagnosis results are accurate. This approach can be applied with minimal modifications to other types of wastewater treatment plants.
基金support from the National Research Foundation of SingaporeNRF-NSFC(Grant No.NRF2018NRF-NSFC003SB-006)
文摘Batch effects are technical sources of variation and can confound analysis.While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm(BECA),we hold the viewpoint that the notion of best is context-dependent.Moreover,alternative questions beyond the simplistic notion of "best" are also interesting:are BECAs robust against various degrees of confounding and if so,what is the limit?Using two different methods for simulating class(phenotype) and batch effects and taking various representative datasets across both genomics(RNA-Seq) and proteomics platforms,we demonstrate that under situations where sample classes and batch factors are moderately confounded,most BECAs are remarkably robust and only weakly affected by upstream normalization procedures.This observation is consistently supported across the multitude of test datasets.BECAs do have limits:When sample classes and batch factors are strongly confounded,BECA performance declines,with variable performance in precision,recall and also batch correction.We also report that while conventional normalization methods have minimal impact on batch effect correction,they do not affect downstream statistical feature selection,and in strongly confounded scenarios,may even outperform BECAs.In other words,removing batch effects is no guarantee of optimal functional analysis.Overall,this study suggests that simplistic performance ranking exercises are quite trivial,and all BECAs are compromises in some context or another.
基金This work was supported in part by project supported by National Natural Science Foundation of China(Grant No.61572182,No.61370225)project supported by Hunan Provincial Natural Science Foundation of China(Grant No.15JJ2007).
文摘Aim to countermeasure the presentation attack for iris recognition system,an iris liveness detection scheme based on batch normalized convolutional neural network(BNCNN)is proposed to improve the reliability of the iris authentication system.The BNCNN architecture with eighteen layers is constructed to detect the genuine iris and fake iris,including convolutional layer,batch-normalized(BN)layer,Relu layer,pooling layer and full connected layer.The iris image is first preprocessed by iris segmentation and is normalized to 256×256 pixels,and then the iris features are extracted by BNCNN.With these features,the genuine iris and fake iris are determined by the decision-making layer.Batch normalization technique is used in BNCNN to avoid the problem of over fitting and gradient disappearing during training.Extensive experiments are conducted on three classical databases:the CASIA Iris Lamp database,the CASIA Iris Syn database and Ndcontact database.The results show that the proposed method can effectively extract micro texture features of the iris,and achieve higher detection accuracy compared with some typical iris liveness detection methods.
基金the National Natural Science Foundation of China(No.61772417,61634004,61602377)Key R&D Program Projects in Shaanxi Province(No.2017GY-060)Shaanxi Natural Science Basic Research Project(No.2018JM4018).
文摘In order to effectively solve the problems of low accuracy,large amount of computation and complex logic of deep learning algorithms in behavior recognition,a kind of behavior recognition based on the fusion of 3 dimensional batch normalization visual geometry group(3D-BN-VGG)and long short-term memory(LSTM)network is designed.In this network,3D convolutional layer is used to extract the spatial domain features and time domain features of video sequence at the same time,multiple small convolution kernels are stacked to replace large convolution kernels,thus the depth of neural network is deepened and the number of network parameters is reduced.In addition,the latest batch normalization algorithm is added to the 3-dimensional convolutional network to improve the training speed.Then the output of the full connection layer is sent to LSTM network as the feature vectors to extract the sequence information.This method,which directly uses the output of the whole base level without passing through the full connection layer,reduces the parameters of the whole fusion network to 15324485,nearly twice as much as those of 3D-BN-VGG.Finally,it reveals that the proposed network achieves 96.5%and 74.9%accuracy in the UCF-101 and HMDB-51 respectively,and the algorithm has a calculation speed of 1066 fps and an acceleration ratio of 1,which has a significant predominance in velocity.