Early exiting has shown significant potential in accelerating the inference of pre-trained language models(PLMs)by allowing easy samples to exit from shallow layers.However,existing early exiting methods primarily rel...Early exiting has shown significant potential in accelerating the inference of pre-trained language models(PLMs)by allowing easy samples to exit from shallow layers.However,existing early exiting methods primarily rely on local information from individual samples to estimate prediction uncertainty for making exiting decisions,overlooking the global information provided by the sample population.This impacts the estimation of prediction uncertainty,compromising the reliability of exiting de-cisions.To remedy this,inspired by principal component analysis(PCA),the authors define a residual score to capture the deviation of features from the principal space of the sample population,providing a global perspective for estimating prediction uncertainty.Building on this,a two-stage exiting strategy is proposed that integrates global information from residual scores with local information from energy scores at both the decision and feature levels.This strategy incorporates three-way decisions to enable more reliable exiting decisions for boundary region samples by delaying judgement.Extensive experiments on the GLUE benchmark validate that the method achieves an average speed-up ratio of 2.17×across all tasks with minimal per-formance degradation.Additionally,it surpasses the state-of-the-art E-LANG by 11%in model acceleration,along with a performance improvement of 0.6 points,demonstrating a better performance-efficiency trade-off.展开更多
BERT is a representative pre-trained language model that has drawn extensive attention for significant improvements in downstream Natural Language Processing(NLP)tasks.The complex architecture and massive parameters b...BERT is a representative pre-trained language model that has drawn extensive attention for significant improvements in downstream Natural Language Processing(NLP)tasks.The complex architecture and massive parameters bring BERT competitive performance but also result in slow speed at model inference time.To speed up BERT inference,FastBERT realizes adaptive inference with an acceptable drop in accuracy based on knowledge distillation and the early-exit technique.However,many factors may limit the performance of FastBERT,such as the teacher classifier that is not knowledgeable enough,the batch size shrinkage and the redundant computation of student classifiers.To overcome these limitations,we propose a new BERT inference method with GPU-Efficient Exit Prediction(GEEP).GEEP leverages the shared exit loss to simplify the training process of FastBERT from two steps into only one step and makes the teacher classifier more knowledgeable by feeding diverse Transformer outputs to the teacher classifier.In addition,the exit layer prediction technique is proposed to utilize a GPU hash table to handle the token-level exit layer distribution and to sort test samples by predicted exit layers.In this way,GEEP can avoid batch size shrinkage and redundant computation of student classifiers.Experimental results on twelve public English and Chinese NLP datasets prove the effectiveness of the proposed approach.The source codes of GEEP will be released to the public upon paper acceptance.展开更多
The steel plate is one of the main products in steel industries,and its surface quality directly affects the final product performance.How to detect surface defects of steel plates in real time during the production p...The steel plate is one of the main products in steel industries,and its surface quality directly affects the final product performance.How to detect surface defects of steel plates in real time during the production process is a challenging problem.The single or fixed model compression method cannot be directly applied to the detection of steel surface defects,because it is difficult to consider the diversity of production tasks,the uncertainty caused by environmental factors,such as communication networks,and the influence of process and working conditions in steel plate production.In this paper,we propose an adaptive model compression method for steel surface defect online detection based on expert knowledge and working conditions.First,we establish an expert system to give lightweight model parameters based on the correlation between defect types and manufacturing processes.Then,lightweight model parameters are adaptively adjusted according to working conditions,which improves detection accuracy while ensuring real-time performance.The experimental results show that compared with the detection method of constant lightweight parameter model,the proposed method makes the total detection time cut down by 23.1%,and the deadline satisfaction ratio increased by 36.5%,while upgrading the accuracy by 4.2%and reducing the false detection rate by 4.3%.展开更多
基金supported by the National Natural Science Foundation of China(No.62376198)the National Key Research and Development Program of China(No.2022YFB3104700)the Shanghai Baiyulan Pujiang Project(No.08002360429).
文摘Early exiting has shown significant potential in accelerating the inference of pre-trained language models(PLMs)by allowing easy samples to exit from shallow layers.However,existing early exiting methods primarily rely on local information from individual samples to estimate prediction uncertainty for making exiting decisions,overlooking the global information provided by the sample population.This impacts the estimation of prediction uncertainty,compromising the reliability of exiting de-cisions.To remedy this,inspired by principal component analysis(PCA),the authors define a residual score to capture the deviation of features from the principal space of the sample population,providing a global perspective for estimating prediction uncertainty.Building on this,a two-stage exiting strategy is proposed that integrates global information from residual scores with local information from energy scores at both the decision and feature levels.This strategy incorporates three-way decisions to enable more reliable exiting decisions for boundary region samples by delaying judgement.Extensive experiments on the GLUE benchmark validate that the method achieves an average speed-up ratio of 2.17×across all tasks with minimal per-formance degradation.Additionally,it surpasses the state-of-the-art E-LANG by 11%in model acceleration,along with a performance improvement of 0.6 points,demonstrating a better performance-efficiency trade-off.
基金supported by the National Natural Science Foundation of China(Grant Nos.U1911203,61877018,61977025,62202170)Alibaba Group through the Alibaba Innovation Research Program.
文摘BERT is a representative pre-trained language model that has drawn extensive attention for significant improvements in downstream Natural Language Processing(NLP)tasks.The complex architecture and massive parameters bring BERT competitive performance but also result in slow speed at model inference time.To speed up BERT inference,FastBERT realizes adaptive inference with an acceptable drop in accuracy based on knowledge distillation and the early-exit technique.However,many factors may limit the performance of FastBERT,such as the teacher classifier that is not knowledgeable enough,the batch size shrinkage and the redundant computation of student classifiers.To overcome these limitations,we propose a new BERT inference method with GPU-Efficient Exit Prediction(GEEP).GEEP leverages the shared exit loss to simplify the training process of FastBERT from two steps into only one step and makes the teacher classifier more knowledgeable by feeding diverse Transformer outputs to the teacher classifier.In addition,the exit layer prediction technique is proposed to utilize a GPU hash table to handle the token-level exit layer distribution and to sort test samples by predicted exit layers.In this way,GEEP can avoid batch size shrinkage and redundant computation of student classifiers.Experimental results on twelve public English and Chinese NLP datasets prove the effectiveness of the proposed approach.The source codes of GEEP will be released to the public upon paper acceptance.
基金supported by the National Key R&D Program of China(No.2018AAA0100500)the National Natural Science Foundation of China(Nos.62232004 and 61632008)+3 种基金the Jiangsu Provincial Key Laboratory of Network and Information Security(No.BM2003201)the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China(No.93K-9)the Collaborative Innovation Center of Novel Software Technology and Industrializationthe Big Data Computing Center of Southeast University in China for providing the experiment environment and computing facility.
文摘The steel plate is one of the main products in steel industries,and its surface quality directly affects the final product performance.How to detect surface defects of steel plates in real time during the production process is a challenging problem.The single or fixed model compression method cannot be directly applied to the detection of steel surface defects,because it is difficult to consider the diversity of production tasks,the uncertainty caused by environmental factors,such as communication networks,and the influence of process and working conditions in steel plate production.In this paper,we propose an adaptive model compression method for steel surface defect online detection based on expert knowledge and working conditions.First,we establish an expert system to give lightweight model parameters based on the correlation between defect types and manufacturing processes.Then,lightweight model parameters are adaptively adjusted according to working conditions,which improves detection accuracy while ensuring real-time performance.The experimental results show that compared with the detection method of constant lightweight parameter model,the proposed method makes the total detection time cut down by 23.1%,and the deadline satisfaction ratio increased by 36.5%,while upgrading the accuracy by 4.2%and reducing the false detection rate by 4.3%.