期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Optimizing BERT for Bengali Emotion Classification: Evaluating Knowledge Distillation, Pruning, and Quantization
1
作者 Md Hasibur Rahman Mohammed Arif Uddin +1 位作者 Zinnat Fowzia Ria Rashedur M.Rahman 《Computer Modeling in Engineering & Sciences》 2025年第2期1637-1666,共30页
The rapid growth of digital data necessitates advanced natural language processing(NLP)models like BERT(Bidi-rectional Encoder Representations from Transformers),known for its superior performance in text classificati... The rapid growth of digital data necessitates advanced natural language processing(NLP)models like BERT(Bidi-rectional Encoder Representations from Transformers),known for its superior performance in text classification.However,BERT’s size and computational demands limit its practicality,especially in resource-constrained settings.This research compresses the BERT base model for Bengali emotion classification through knowledge distillation(KD),pruning,and quantization techniques.Despite Bengali being the sixth most spoken language globally,NLP research in this area is limited.Our approach addresses this gap by creating an efficient BERT-based model for Bengali text.We have explored 20 combinations for KD,quantization,and pruning,resulting in improved speedup,fewer parameters,and reduced memory size.Our best results demonstrate significant improvements in both speed and efficiency.For instance,in the case of mBERT,we achieved a 3.87×speedup and 4×compression ratio with a combination of Distil+Prune+Quant that reduced parameters from 178 to 46 M,while the memory size decreased from 711 to 178 MB.These results offer scalable solutions for NLP tasks in various languages and advance the field of model compression,making these models suitable for real-world applications in resource-limited environments. 展开更多
关键词 Bengali NLP black-box distillation emotion classification model compression post-training quantization unstructured pruning
在线阅读 下载PDF
H-ViT: hardware-friendly post-training quantization for efficient vision transformer inference
2
作者 Jing Liu Jiaqi Lai +2 位作者 Xiaodong Deng Caigui Jiang Nanning Zheng 《Autonomous Intelligent Systems》 2025年第1期1-9,共9页
Vision Transformers(ViTs)have achieved state-of-the-art performance on various computer vision tasks.However these models are memory-consuming and computation-intensive,making their deployment and efficient inference ... Vision Transformers(ViTs)have achieved state-of-the-art performance on various computer vision tasks.However these models are memory-consuming and computation-intensive,making their deployment and efficient inference on edge devices challenging.Model quantization is a promising approach to reduce model complexity.Prior works have explored tailored quantization algorithms for ViTs but unfortunately retained floating-point(FP)scaling factors,which not only yield non-negligible re-quantization overhead,but also hinder the quantized models to perform efficient integer-only inference.In this paper,we propose H-ViT,a dedicated post-training quantization scheme(e.g.,symmetric uniform quantization and layer-wise quantization for both weights and part of activations)to effectively quantize ViTs with fewer Power-of-Two(PoT)scaling factors,thus minimizing the re-quantization overhead and memory consumption.In addition,observing serious inter-channel variation in LayerNorm inputs and outputs,we propose Power-of-Two quantization(PTQ),a systematic method to reducing the performance degradation without hyper-parameters.Extensive experiments are conducted on multiple vision tasks with different model variants,proving that H-ViT offers comparable(or even slightly higher)INT8 quantization performance with PoT scaling factors when compared to the counterpart with floating-point scaling factors.For instance,we reach 78.43 top-1 accuracy with DeiT-S on ImageNet,51.6 box AP and 44.8 mask AP with Cascade Mask R-CNN(Swin-B)on COCO. 展开更多
关键词 Vision Transformers post-training quantization Power-of-Two scaling factors Hardware deployment
原文传递
后训练量化方法综述(特邀)
3
作者 张俊娜 王泓尊 丁春涛 《计算机工程》 北大核心 2026年第1期33-60,共28页
后训练量化(PTQ)是一种高效的模型压缩方法,它无需重新训练模型,只需少量(或无需)无标签校准数据即可将高精度浮点模型的参数转换为低比特整数表示。该方法在显著降低存储与计算开销的同时能够最大限度地保留原始模型的推理精度,因而受... 后训练量化(PTQ)是一种高效的模型压缩方法,它无需重新训练模型,只需少量(或无需)无标签校准数据即可将高精度浮点模型的参数转换为低比特整数表示。该方法在显著降低存储与计算开销的同时能够最大限度地保留原始模型的推理精度,因而受到学术界与工业界的广泛关注。从PTQ的量化步骤、方法分类、工具生态、应用进展4个维度,系统总结PTQ的研究进展。首先,构建了量化流程框架,涵盖动态范围统计、量化参数计算、权重与激活量化、误差优化和模型生成等步骤;其次,提出一个完整的量化方法分类体系,从量化粒度、位宽、校准方法到结构导向量化;再次,分析了支持PTQ规模化应用的工具生态,探讨了其在硬件适配和工程部署中的应用价值;最后,总结了PTQ方法的融合与应用进展,并指出PTQ方法在实践中面临的挑战,尤其是跨模态一致性、极低比特语义崩塌与硬件适配等难题。这些实践挑战的总结不仅揭示了当前技术的局限性,也为未来研究提供了重要方向。本综述为学术界与工业界提供了PTQ方法的参考框架,助力推动人工智能在资源受限场景中的广泛应用。 展开更多
关键词 后训练量化 后训练量化步骤 后训练量化方法分类 工具生态 应用进展
在线阅读 下载PDF
TP-ViT:truncated uniform-log2 quantizer and progressive bit-decline reconstruction for vision Transformer quantization
4
作者 Xichuan ZHOU Sihuan ZHAO +4 位作者 Rui DING Jiayu SHI Jing NIE Lihui CHEN Haijun LIU 《ENGINEERING Information Technology & Electronic Engineering》 2026年第1期47-58,共12页
Vision Transformers(ViTs)have achieved remarkable success across various artificial intelligence-based computer vision applications.However,their demanding computational and memory requirements pose significant challe... Vision Transformers(ViTs)have achieved remarkable success across various artificial intelligence-based computer vision applications.However,their demanding computational and memory requirements pose significant challenges for de-ployment on resource-constrained edge devices.Although post-training quantization(PTQ)provides a promising solution by reducing model precision with minimal calibration data,aggressive low-bit quantization typically leads to substantial perfor-mance degradation.To address this challenge,we present the truncated uniform-log2 quantizer and progressive bit-decline reconstruction method for vision Transformer quantization(TP-ViT).It is an innovative PTQ framework specifically designed for ViTs,featuring two key technical contributions:(1)truncated uniform-log2 quantizer,a novel quantization approach which effectively handles outlier values in post-Softmax activations,significantly reducing quantization errors;(2)bit-decline optimiza-tion strategy,which employs transition weights to gradually reduce bit precision while maintaining model performance under extreme quantization conditions.Comprehensive experiments on image classification,object detection,and instance segmenta-tion tasks demonstrate TP-ViT’s superior performance compared to state-of-the-art PTQ methods,particularly in challenging 3-bit quantization scenarios.Our framework achieves a notable 6.18 percentage points improvement in top-1 accuracy for ViT-small under 3-bit quantization.These results validate TP-ViT’s robustness and general applicability,paving the way for more efficient deployment of ViT models in computer vision applications on edge hardware. 展开更多
关键词 Vision Transformers post-training quantization Block reconstruction Image classification Object detection Instance segmentation
在线阅读 下载PDF
针对图像指代分割的训练后量化策略
5
作者 杨航 姜晓燕 《计算机应用研究》 北大核心 2025年第7期2025-2031,共7页
图像指代分割(RIS)旨在通过理解视觉和语言信息来分割图像中给定语句所描述的对象,在交互式图片编辑以及语言引导的人机交互领域具有很强的应用前景。然而,现有解决方案倾向于探索高性能模型,忽视了对资源有限的边缘设备上实际应用的考... 图像指代分割(RIS)旨在通过理解视觉和语言信息来分割图像中给定语句所描述的对象,在交互式图片编辑以及语言引导的人机交互领域具有很强的应用前景。然而,现有解决方案倾向于探索高性能模型,忽视了对资源有限的边缘设备上实际应用的考量。为解决这一问题,设计并实现了一种有效的训练后量化框架。具体而言,首先深入分析了使用朴素量化方法导致模型性能崩溃的根本原因,据此提出了双区域均衡量化策略以解决视觉编码器中softmax和GELU操作后激活值非正态分布问题,同时引入重排序分组量化策略应对文本编码器的线性层异常激活值带来的量化难题。在三个基准数据集上设置不同量化位宽进行大量实验,结果表明,所提方法在与现有方法的对比中展现出显著的优越性。作为首个专为图像指代分割任务设计量化方案的工作,验证了使用训练后量化策略将图像指代分割模型高效部署到边缘设备的可行性。 展开更多
关键词 图像指代分割 训练后量化 跨模态融合 深度学习
在线阅读 下载PDF
An adaptive outlier correction quantization method for vision Transformers
6
作者 Zheyang LI Chaoxiang LAN +3 位作者 Kai ZHANG Wenming TAN Ye REN Jun XIAO 《Frontiers of Information Technology & Electronic Engineering》 2025年第10期1879-1895,共17页
Transformers have demonstrated considerable success across various domains but are constrained by their significant computational and memory requirements.This poses challenges for deployment on resource-constrained de... Transformers have demonstrated considerable success across various domains but are constrained by their significant computational and memory requirements.This poses challenges for deployment on resource-constrained devices.Quantization,as an effective model compression method,can significantly reduce the operational time of Transformers on edge devices.Notably,Transformers display more substantial outliers than convolutional neural networks,leading to uneven feature distribution among different channels and tokens.To address this issue,we propose an adaptive outlier correction quantization(AOCQ)method for Transformers,which significantly alleviates the adverse effects of these outliers.AOCQ adjusts the notable discrepancies in channels and tokens across three levels:operator level,framework level,and loss level.We introduce a new operator that equivalently balances the activations across different channels and insert an extra stage to optimize the activation quantization step on the framework level.Additionally,we transfer the imbalanced activations across tokens and channels to the optimization of model weights on the loss level.Based on the theoretical study,our method can reduce the quantization error.The effectiveness of the proposed method is verified on various benchmark models and tasks.Surprisingly,DeiT-Base with 8-bit post-training quantization(PTQ)can achieve 81.57%accuracy with a 0.28 percentage point drop while enjoying 4×faster runtime.Furthermore,the weights of Swin and DeiT on several tasks,including classification and object detection,can be post-quantized to ultra-low 4 bits,with a minimal accuracy loss of 2%,while requiring nearly 8×less memory. 展开更多
关键词 Transformer Model compression and acceleration post-training quantization OUTLIER
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部