期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
H-ViT: hardware-friendly post-training quantization for efficient vision transformer inference
1
作者 Jing Liu Jiaqi Lai +2 位作者 Xiaodong Deng Caigui Jiang Nanning Zheng 《Autonomous Intelligent Systems》 2025年第1期1-9,共9页
Vision Transformers(ViTs)have achieved state-of-the-art performance on various computer vision tasks.However these models are memory-consuming and computation-intensive,making their deployment and efficient inference ... Vision Transformers(ViTs)have achieved state-of-the-art performance on various computer vision tasks.However these models are memory-consuming and computation-intensive,making their deployment and efficient inference on edge devices challenging.Model quantization is a promising approach to reduce model complexity.Prior works have explored tailored quantization algorithms for ViTs but unfortunately retained floating-point(FP)scaling factors,which not only yield non-negligible re-quantization overhead,but also hinder the quantized models to perform efficient integer-only inference.In this paper,we propose H-ViT,a dedicated post-training quantization scheme(e.g.,symmetric uniform quantization and layer-wise quantization for both weights and part of activations)to effectively quantize ViTs with fewer Power-of-Two(PoT)scaling factors,thus minimizing the re-quantization overhead and memory consumption.In addition,observing serious inter-channel variation in LayerNorm inputs and outputs,we propose Power-of-Two quantization(PTQ),a systematic method to reducing the performance degradation without hyper-parameters.Extensive experiments are conducted on multiple vision tasks with different model variants,proving that H-ViT offers comparable(or even slightly higher)INT8 quantization performance with PoT scaling factors when compared to the counterpart with floating-point scaling factors.For instance,we reach 78.43 top-1 accuracy with DeiT-S on ImageNet,51.6 box AP and 44.8 mask AP with Cascade Mask R-CNN(Swin-B)on COCO. 展开更多
关键词 Vision Transformers Post-training quantization Power-of-Two scaling factors hardware deployment
原文传递
Branch Convolution Quantization for Object Detection 被引量:1
2
作者 Miao Li Feng Zhang Cuiting Zhang 《Machine Intelligence Research》 EI CSCD 2024年第6期1192-1200,共9页
Quantization is one of the research topics on lightweight and edge-deployed convolutional neural networks(CNNs).Usu-ally,the activation and weight bit-widths between layers are inconsistent to ensure good performance ... Quantization is one of the research topics on lightweight and edge-deployed convolutional neural networks(CNNs).Usu-ally,the activation and weight bit-widths between layers are inconsistent to ensure good performance of CNN,meaning that dedicated hardware has to be designed for specific layers.In this work,we explore a unified quantization method with extremely low-bit quantized weights for all layers.We use thermometer coding to convert the 8-bit RGB input images to the same bit-width as that of the activa-tions of middle layers.For the quantization of the results of the last layer,we propose a branch convolution quantization(BCQ)method.Together with the extremely low-bit quantization of the weights,the deployment of the network on circuits will be simpler than that of other works and consistent throughout all the layers including the first layer and the last layer.Taking tiny_yolo_v3 and yolo_v3 on VOC and COCO datasets as examples,the feasibility of thermometer coding on input images and branch convolution quantization on output results is verified.Finally,tiny_yolo_v3 is deployed on FPGA,which further demonstrates the high performance of the proposed algorithm on hardware. 展开更多
关键词 Branch convolution quantization thermometer coding extremely low-bit quantization hardware deployment object detection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部