期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Two-Stage Early Exiting From Globality Towards Reliability
1
作者 Jianing He Qi Zhang +1 位作者 Hongyun Zhang Duoqian Miao 《CAAI Transactions on Intelligence Technology》 2025年第4期1019-1032,共14页
Early exiting has shown significant potential in accelerating the inference of pre-trained language models(PLMs)by allowing easy samples to exit from shallow layers.However,existing early exiting methods primarily rel... Early exiting has shown significant potential in accelerating the inference of pre-trained language models(PLMs)by allowing easy samples to exit from shallow layers.However,existing early exiting methods primarily rely on local information from individual samples to estimate prediction uncertainty for making exiting decisions,overlooking the global information provided by the sample population.This impacts the estimation of prediction uncertainty,compromising the reliability of exiting de-cisions.To remedy this,inspired by principal component analysis(PCA),the authors define a residual score to capture the deviation of features from the principal space of the sample population,providing a global perspective for estimating prediction uncertainty.Building on this,a two-stage exiting strategy is proposed that integrates global information from residual scores with local information from energy scores at both the decision and feature levels.This strategy incorporates three-way decisions to enable more reliable exiting decisions for boundary region samples by delaying judgement.Extensive experiments on the GLUE benchmark validate that the method achieves an average speed-up ratio of 2.17×across all tasks with minimal per-formance degradation.Additionally,it surpasses the state-of-the-art E-LANG by 11%in model acceleration,along with a performance improvement of 0.6 points,demonstrating a better performance-efficiency trade-off. 展开更多
关键词 early exiting inference acceleration pre-trained language model principal component analysis three-way decisions
在线阅读 下载PDF
Accelerating BERT inference with GPU-efficient exit prediction
2
作者 Lei LI Chengyu WANG +3 位作者 Minghui QIU Cen CHEN Ming GAO Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第3期31-42,共12页
BERT is a representative pre-trained language model that has drawn extensive attention for significant improvements in downstream Natural Language Processing(NLP)tasks.The complex architecture and massive parameters b... BERT is a representative pre-trained language model that has drawn extensive attention for significant improvements in downstream Natural Language Processing(NLP)tasks.The complex architecture and massive parameters bring BERT competitive performance but also result in slow speed at model inference time.To speed up BERT inference,FastBERT realizes adaptive inference with an acceptable drop in accuracy based on knowledge distillation and the early-exit technique.However,many factors may limit the performance of FastBERT,such as the teacher classifier that is not knowledgeable enough,the batch size shrinkage and the redundant computation of student classifiers.To overcome these limitations,we propose a new BERT inference method with GPU-Efficient Exit Prediction(GEEP).GEEP leverages the shared exit loss to simplify the training process of FastBERT from two steps into only one step and makes the teacher classifier more knowledgeable by feeding diverse Transformer outputs to the teacher classifier.In addition,the exit layer prediction technique is proposed to utilize a GPU hash table to handle the token-level exit layer distribution and to sort test samples by predicted exit layers.In this way,GEEP can avoid batch size shrinkage and redundant computation of student classifiers.Experimental results on twelve public English and Chinese NLP datasets prove the effectiveness of the proposed approach.The source codes of GEEP will be released to the public upon paper acceptance. 展开更多
关键词 BERT FastBERT inference acceleration model distillation early exit text classification
原文传递
Adaptive Model Compression for Steel Plate Surface Defect Detection:An Expert Knowledge and Working Condition-Based Approach
3
作者 Maojie Sun Fang Dong +1 位作者 Zhaowu Huang Junzhou Luo 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第6期1851-1871,共21页
The steel plate is one of the main products in steel industries,and its surface quality directly affects the final product performance.How to detect surface defects of steel plates in real time during the production p... The steel plate is one of the main products in steel industries,and its surface quality directly affects the final product performance.How to detect surface defects of steel plates in real time during the production process is a challenging problem.The single or fixed model compression method cannot be directly applied to the detection of steel surface defects,because it is difficult to consider the diversity of production tasks,the uncertainty caused by environmental factors,such as communication networks,and the influence of process and working conditions in steel plate production.In this paper,we propose an adaptive model compression method for steel surface defect online detection based on expert knowledge and working conditions.First,we establish an expert system to give lightweight model parameters based on the correlation between defect types and manufacturing processes.Then,lightweight model parameters are adaptively adjusted according to working conditions,which improves detection accuracy while ensuring real-time performance.The experimental results show that compared with the detection method of constant lightweight parameter model,the proposed method makes the total detection time cut down by 23.1%,and the deadline satisfaction ratio increased by 36.5%,while upgrading the accuracy by 4.2%and reducing the false detection rate by 4.3%. 展开更多
关键词 steel surface defect detection inference acceleration model compression expert knowledge PRUNING QUANTIZATION
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部