基于剪枝-蒸馏的视觉Transformer模型压缩

Vision transformer model compression based on pruning-distillation

下载PDF

导出

摘要现如今,视觉Transformer在计算机视觉领域的许多任务中都取得了卓越的表现,但其复杂的网络结构通常需要占用大量的存储和计算资源,因此难以在计算资源受限设备上广泛部署。为此提出了一种基于剪枝和蒸馏的视觉Transformer模型压缩方法,旨在保证模型性能的前提下缩减模型大小。首先,通过对视觉Transformer的结构分析,确定宽度剪枝的对象为多头自注意力的注意力头和多层感知机中隐藏层的神经元,并采用基于模型损失函数变化的参数重要性评估策略对其进行参数重要性评估。其次,通过剪枝后蒸馏策略在模型宽度维度进行裁剪并恢复剪枝后宽度子网络的精度。最后,在深度维度上,通过剪枝后蒸馏得到最终的压缩模型。所提出方法在Tiny ImageNet、CIFAR-100和CIFAR-10数据集上对视觉Transformer进行了压缩实验。其中,在Tiny ImageNet上,ViT-S模型在参数量和计算量减少30%时,精度仅降低0.3%,而ViT-B模型精度甚至提升了0.6%。实验结果表明,所提方法能够有效实现模型精度和压缩率的平衡。 Currently,the Vision Transformer has demonstrated outstanding performance across various tasks in the field of computer vision.However,their complex network structures typically require substantial storage and computational resources,making widespread deployment on resource-constrained devices challenging.To address this issue,we propose a compression method for the Vision Transformer based on pruning and distillation,aiming to reduce the model size while ensuring performance retention.First,through a structural analysis of the Vision Transformer,we identify the targets for width pruning as the attention heads in the multi-head self-attention mechanism and the neurons in the hidden layers of the multi-layer perceptron.We then employ a parameter importance evaluation strategy based on changes in the model’s loss function to assess these parameters.Next,we apply a post-pruning distillation strategy to prune the model in the terms of width and restore the accuracy of the pruned subnetworks.Finally,in the depth dimension,we obtain the final compressed model through post-pruning distillation.The proposed method is experimentally validated on the Tiny ImageNet,CIFAR-100,and CIFAR-10 datasets,compressing the Vision Transformer.When reducing the parameter count and computational load by 30%,on the Tiny ImageNet dataset,the accuracy of the ViT-S model is decreased by only 0.3%,while the accuracy of the ViT-B model is even improved by 0.6%.Experimental results indicate that our proposed method effectively balances the model accuracy and compression ratio.

作者郑洋蒋晓天付东豪郭开泰梁继民 ZHENG Yang;JIANG Xiaotian;FU Donghao;GUO Kaitai;LIANG Jimin(Schoolof Electronic Engineering,Xidian University,Xi’an 710071,China)

机构地区西安电子科技大学电子工程学院

出处《西安电子科技大学学报》北大核心 2025年第4期55-65,共11页 Journal of Xidian University

基金国家自然科学基金青年项目(62101416,62301405) 国家自然科学基金(62476205)。

关键词模型压缩视觉Transformer 剪枝知识蒸馏 model compression vision transformer pruning knowledge distillation

分类号 TP311.52 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

1张胜,邓全焕,王德伟,李丹迈,曹琛,张振云.基于T-S模糊神经网络PID控制DC/DC变换器的研究[J].电工技术,2025(14):19-23.
2梁冰雪,王文婧,王皓祺,关权,秦玉华.面向糖尿病视网膜病变分级的多层特征关注增强网络[J].中国医学物理学杂志,2025,42(9):1174-1183.
3王金戈,田益民.基于C++的单级倒立摆T-S模型模糊控制算法设计[J].动力系统与控制,2025,14(3):173-180.

西安电子科技大学学报

2025年第4期

浏览历史

内容加载中请稍等...

基于剪枝-蒸馏的视觉Transformer模型压缩

相关作者

相关机构

相关主题

浏览历史