期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
An adaptive outlier correction quantization method for vision Transformers
1
作者 Zheyang LI Chaoxiang LAN +3 位作者 Kai ZHANG Wenming TAN Ye REN Jun XIAO 《Frontiers of Information Technology & Electronic Engineering》 2025年第10期1879-1895,共17页
Transformers have demonstrated considerable success across various domains but are constrained by their significant computational and memory requirements.This poses challenges for deployment on resource-constrained de... Transformers have demonstrated considerable success across various domains but are constrained by their significant computational and memory requirements.This poses challenges for deployment on resource-constrained devices.Quantization,as an effective model compression method,can significantly reduce the operational time of Transformers on edge devices.Notably,Transformers display more substantial outliers than convolutional neural networks,leading to uneven feature distribution among different channels and tokens.To address this issue,we propose an adaptive outlier correction quantization(AOCQ)method for Transformers,which significantly alleviates the adverse effects of these outliers.AOCQ adjusts the notable discrepancies in channels and tokens across three levels:operator level,framework level,and loss level.We introduce a new operator that equivalently balances the activations across different channels and insert an extra stage to optimize the activation quantization step on the framework level.Additionally,we transfer the imbalanced activations across tokens and channels to the optimization of model weights on the loss level.Based on the theoretical study,our method can reduce the quantization error.The effectiveness of the proposed method is verified on various benchmark models and tasks.Surprisingly,DeiT-Base with 8-bit post-training quantization(PTQ)can achieve 81.57%accuracy with a 0.28 percentage point drop while enjoying 4×faster runtime.Furthermore,the weights of Swin and DeiT on several tasks,including classification and object detection,can be post-quantized to ultra-low 4 bits,with a minimal accuracy loss of 2%,while requiring nearly 8×less memory. 展开更多
关键词 Transformer model compression and acceleration Post-training quantization OUTLIER
原文传递
Pruning-aware Sparse Regularization for Network Pruning 被引量:1
2
作者 Nan-Fei Jiang Xu Zhao +3 位作者 Chao-Yang Zhao Yong-Qi An Ming Tang Jin-Qiao Wang 《Machine Intelligence Research》 EI CSCD 2023年第1期109-120,共12页
Structural neural network pruning aims to remove the redundant channels in the deep convolutional neural networks(CNNs)by pruning the filters of less importance to the final output accuracy.To reduce the degradation o... Structural neural network pruning aims to remove the redundant channels in the deep convolutional neural networks(CNNs)by pruning the filters of less importance to the final output accuracy.To reduce the degradation of performance after pruning,many methods utilize the loss with sparse regularization to produce structured sparsity.In this paper,we analyze these sparsity-training-based methods and find that the regularization of unpruned channels is unnecessary.Moreover,it restricts the network′s capacity,which leads to under-fitting.To solve this problem,we propose a novel pruning method,named Mask Sparsity,with pruning-aware sparse regularization.Mask Sparsity imposes the fine-grained sparse regularization on the specific filters selected by a pruning mask,rather than all the filters of the model.Before the fine-grained sparse regularization of Mask Sparity,we can use many methods to get the pruning mask,such as running the global sparse regularization.Mask Sparsity achieves a 63.03%float point operations(FLOPs)reduction on Res Net-110 by removing 60.34%of the parameters,with no top-1 accuracy loss on CIFAR-10.On ILSVRC-2012,Mask Sparsity reduces more than 51.07%FLOPs on Res Net-50,with only a loss of 0.76%in the top-1 accuracy.The code of this paper is released at https://github.com/CASIA-IVA-Lab/Mask Sparsity.We have also integrated the code into a self-developed Py Torch pruning toolkit,named Easy Pruner,at https://gitee.com/casia_iva_engineer/easypruner. 展开更多
关键词 Deep learning convolutional neural network(CNN) model compression and acceleration network pruning regula rization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部