期刊文献+
共找到466篇文章
< 1 2 24 >
每页显示 20 50 100
Optimizing Fine-Tuning in Quantized Language Models:An In-Depth Analysis of Key Variables
1
作者 Ao Shen Zhiquan Lai +1 位作者 Dongsheng Li Xiaoyu Hu 《Computers, Materials & Continua》 SCIE EI 2025年第1期307-325,共19页
Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in speci... Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments. 展开更多
关键词 Large-scale Language model Parameter-Efficient Fine-Tuning parameter quantization key variable trainable parameters experimental analysis
在线阅读 下载PDF
T-S-fuzzy-model-based quantized control for nonlinear networked control systems
2
作者 褚红燕 费树岷 +1 位作者 陈海霞 翟军勇 《Journal of Southeast University(English Edition)》 EI CAS 2010年第1期137-141,共5页
In order to overcome data-quantization, networked-induced delay, network packet dropouts and wrong sequences in the nonlinear networked control system, a novel nonlinear networked control system model is built by the ... In order to overcome data-quantization, networked-induced delay, network packet dropouts and wrong sequences in the nonlinear networked control system, a novel nonlinear networked control system model is built by the T-S fuzzy method. Two time-varying quantizers are added in the model. The key analysis steps in the method are to construct an improved interval-delay-dependent Lyapunov functional and to introduce the free-weighting matrix. By making use of the parallel distributed compensation technology and the convexity of the matrix function, the improved criteria of the stabilization and stability are obtained. Simulation experiments show that the parameters of the controllers and quantizers satisfying a certain performance can be obtained by solving a set of LMIs. The application of the nonlinear mass-spring system is provided to show that the proposed method is effective. 展开更多
关键词 T-S fuzzy model linear matrix inequalities(LMIs) quantizers
在线阅读 下载PDF
Establishing formal state space models via quantization forquantum control systems 被引量:2
3
作者 DongDaoyi ChenZonghai 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第2期398-402,共5页
Formal state space models of quantum control systems are deduced and a scheme to establish formal state space models via quantization could been obtained for quantum control systems is proposed. State evolution of qua... Formal state space models of quantum control systems are deduced and a scheme to establish formal state space models via quantization could been obtained for quantum control systems is proposed. State evolution of quantum control systems must accord with Schrdinger equations, so it is foremost to obtain Hamiltonian operators of systems. There are corresponding relations between operators of quantum systems and corresponding physical quantities of classical systems, such as momentum, energy and Hamiltonian, so Schrdinger equation models of corresponding quantum control systems via quantization could been obtained from classical control systems, and then establish formal state space models through the suitable transformation from Schrdinger equations for these quantum control systems. This method provides a new kind of path for modeling in quantum control. 展开更多
关键词 quantum control systems formal state space models quantization.
在线阅读 下载PDF
Pattern-Moving-Based Parameter Identification of Output Error Models with Multi-Threshold Quantized Observations 被引量:2
4
作者 Xiangquan Li Zhengguang Xu +1 位作者 Cheng Han Ning Li 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第3期1807-1825,共19页
This paper addresses a modified auxiliary model stochastic gradient recursive parameter identification algorithm(M-AM-SGRPIA)for a class of single input single output(SISO)linear output error models with multi-thresho... This paper addresses a modified auxiliary model stochastic gradient recursive parameter identification algorithm(M-AM-SGRPIA)for a class of single input single output(SISO)linear output error models with multi-threshold quantized observations.It proves the convergence of the designed algorithm.A pattern-moving-based system dynamics description method with hybrid metrics is proposed for a kind of practical single input multiple output(SIMO)or SISO nonlinear systems,and a SISO linear output error model with multi-threshold quantized observations is adopted to approximate the unknown system.The system input design is accomplished using the measurement technology of random repeatability test,and the probabilistic characteristic of the explicit metric value is employed to estimate the implicit metric value of the pattern class variable.A modified auxiliary model stochastic gradient recursive algorithm(M-AM-SGRA)is designed to identify the model parameters,and the contraction mapping principle proves its convergence.Two numerical examples are given to demonstrate the feasibility and effectiveness of the achieved identification algorithm. 展开更多
关键词 Pattern moving multi-threshold quantized observations output error model auxiliary model parameter identification
在线阅读 下载PDF
Using a Toy Model to Improve the Quantization of Gravity and Field Theories 被引量:1
5
作者 John R. Klauder 《Journal of High Energy Physics, Gravitation and Cosmology》 2022年第2期303-308,共6页
A half-harmonic oscillator, which gets its name because the position coordinate is strictly positive, has been quantized and determined that it was a physically correct quantization. This positive result was found usi... A half-harmonic oscillator, which gets its name because the position coordinate is strictly positive, has been quantized and determined that it was a physically correct quantization. This positive result was found using affine quantization (AQ). The main purpose of this paper is to compare results of this new quantization procedure with those of canonical quantization (CQ). Using Ashtekar-like classical variables and CQ, we quantize the same toy model. While these two quantizations lead to different results, they both would reduce to the same classical Hamiltonian if &hstrok;→ 0. Since these two quantizations have differing results, only one of the quantizations can be physically correct. Two brief sections also illustrate how AQ can correctly help quantum gravity and the quantization of most field theory problems. 展开更多
关键词 Toy model Affine quantization (AQ) Canonical quantization (CQ)
在线阅读 下载PDF
The CP^1 nonlinear sigma model with ChernSimons term in the Faddeev-Jachiw quantization formalism
6
作者 王永龙 李子平 《Chinese Physics B》 SCIE EI CAS CSCD 2006年第9期1976-1980,共5页
Using the Faddeev-Jackiw (FJ) quantization method, this paper treats the CP^1nonlinear sigma model with ChernSimons term. The generalized FJ brackets are obtained in the framework of this quantization method, which ... Using the Faddeev-Jackiw (FJ) quantization method, this paper treats the CP^1nonlinear sigma model with ChernSimons term. The generalized FJ brackets are obtained in the framework of this quantization method, which agree with the results obtained by using the Dirac's method. 展开更多
关键词 Faddeev-Jackiw quantization method CP^1 nonlinear sigma model Chern-Simons theories constrained systems
原文传递
A Novel Quantization and Model Compression Approach for Hardware Accelerators in Edge Computing
7
作者 Fangzhou He Ke Ding +3 位作者 DingjiangYan Jie Li Jiajun Wang Mingzhe Chen 《Computers, Materials & Continua》 SCIE EI 2024年第8期3021-3045,共25页
Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro... Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme. 展开更多
关键词 Edge computing model compression hardware accelerator power-of-two quantization
在线阅读 下载PDF
Metaheuristics with Vector Quantization Enabled Codebook Compression Model for Secure Industrial Embedded Environment
8
作者 Adepu Shravan Kumar S.Srinivasan 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3607-3620,共14页
At the present time,the Industrial Internet of Things(IIoT)has swiftly evolved and emerged,and picture data that is collected by terminal devices or IoT nodes are tied to the user's private data.The use of image s... At the present time,the Industrial Internet of Things(IIoT)has swiftly evolved and emerged,and picture data that is collected by terminal devices or IoT nodes are tied to the user's private data.The use of image sensors as an automa-tion tool for the IIoT is increasingly becoming more common.Due to the fact that this organisation transfers an enormous number of photographs at any one time,one of the most significant issues that it has is reducing the total quantity of data that is sent and,as a result,the available bandwidth,without compromising the image quality.Image compression in the sensor,on the other hand,expedites the transfer of data while simultaneously reducing bandwidth use.The traditional method of protecting sensitive data is rendered less effective in an environment dominated by IoT owing to the involvement of third parties.The image encryp-tion model provides a safe and adaptable method to protect the confidentiality of picture transformation and storage inside an IIoT system.This helps to ensure that image datasets are kept safe.The Linde–Buzo–Gray(LBG)methodology is an example of a vector quantization algorithm that is extensively used and a rela-tively new form of picture reduction known as vector quantization(VQ).As a result,the purpose of this research is to create an artificial humming bird optimi-zation approach that combines LBG-enabled codebook creation and encryption(AHBO-LBGCCE)for use in an IIoT setting.In the beginning,the AHBO-LBGCCE method used the LBG model in conjunction with the AHBO algorithm in order to construct the VQ.The Burrows-Wheeler Transform(BWT)model is used in order to accomplish codebook compression.In addition,the Blowfish algorithm is used in order to carry out the encryption procedure so that security may be attained.A comprehensive experimental investigation is carried out in order to verify the effectiveness of the proposed algorithm in comparison to other algorithms.The experimental values ensure that the suggested approach and the outcomes are examined in a variety of different perspectives in order to further enhance them. 展开更多
关键词 Codebook compression industrial internet of things lbg model metaheuristics vector quantization
在线阅读 下载PDF
Gravitationally Quantized Orbits in the Solar System: Computations Based on the Global Polytropic Model
9
作者 Vassilis Geroyannis Florendia Valvi Themis Dallas 《International Journal of Astronomy and Astrophysics》 2014年第3期464-473,共10页
The so-called “global polytropic model” is based on the assumption of hydrostatic equilibrium for the solar system, or for a planet’s system of statellites (like the Jovian system), described by the Lane-Emden diff... The so-called “global polytropic model” is based on the assumption of hydrostatic equilibrium for the solar system, or for a planet’s system of statellites (like the Jovian system), described by the Lane-Emden differential equation. A polytropic sphere of polytropic index?n?and radius?R1?represents the central component?S1?(Sun or planet) of a polytropic configuration with further components the polytropic spherical shells?S2,?S3,?..., defined by the pairs of radi (R1,?R2), (R2,?R3),?..., respectively.?R1,?R2,?R3,?..., are the roots of the real part Re(θ) of the complex Lane-Emden function?θ. Each polytropic shell is assumed to be an appropriate place for a planet, or a planet’s satellite, to be “born” and “live”. This scenario has been studied numerically for the cases of the solar and the Jovian systems. In the present paper, the Lane-Emden differential equation is solved numerically in the complex plane by using the Fortran code DCRKF54 (modified Runge-Kutta-Fehlberg code of fourth and fifth order for solving initial value problems in the complex plane along complex paths). We include in our numerical study some trans-Neptunian objects. 展开更多
关键词 Complex-Plane Strategy GLOBAL Polytropic model Jovian SYSTEM quantizED ORBITS Solar SYSTEM Trans-Neptunian Objects
在线阅读 下载PDF
BER performance analysis of non-Hermitian symmetry OFDM-VLC systems with ADC quantization noise
10
作者 WANG Zhongpeng AI Caihua ZHANG Lijuan 《Optoelectronics Letters》 2025年第11期677-683,共7页
Quantization noise caused by analog-to-digital converter(ADC)gives rise to the reliability performance degradation of communication systems.In this paper,a quantized non-Hermitian symmetry(NHS)orthogonal frequency-div... Quantization noise caused by analog-to-digital converter(ADC)gives rise to the reliability performance degradation of communication systems.In this paper,a quantized non-Hermitian symmetry(NHS)orthogonal frequency-division multiplexing-based visible light communication(OFDM-VLC)system is presented.In order to analyze the effect of the resolution of ADC on NHS OFDM-VLC,a quantized mathematical model of NHS OFDM-VLC is established.Based on the proposed quantized model,a closed-form bit error rate(BER)expression is derived.The theoretical analysis and simulation results both confirm the effectiveness of the obtained BER formula in high-resolution ADC.In addition,channel coding is helpful in compensating for the BER performance loss due to the utilization of lower resolution ADC. 展开更多
关键词 quantized modela communication systemsin Bit Error Rate quantized mathematical model reliability performance degradation non hermitian symmetry ADC quantization OFDM VLC
原文传递
Optimizing BERT for Bengali Emotion Classification: Evaluating Knowledge Distillation, Pruning, and Quantization
11
作者 Md Hasibur Rahman Mohammed Arif Uddin +1 位作者 Zinnat Fowzia Ria Rashedur M.Rahman 《Computer Modeling in Engineering & Sciences》 2025年第2期1637-1666,共30页
The rapid growth of digital data necessitates advanced natural language processing(NLP)models like BERT(Bidi-rectional Encoder Representations from Transformers),known for its superior performance in text classificati... The rapid growth of digital data necessitates advanced natural language processing(NLP)models like BERT(Bidi-rectional Encoder Representations from Transformers),known for its superior performance in text classification.However,BERT’s size and computational demands limit its practicality,especially in resource-constrained settings.This research compresses the BERT base model for Bengali emotion classification through knowledge distillation(KD),pruning,and quantization techniques.Despite Bengali being the sixth most spoken language globally,NLP research in this area is limited.Our approach addresses this gap by creating an efficient BERT-based model for Bengali text.We have explored 20 combinations for KD,quantization,and pruning,resulting in improved speedup,fewer parameters,and reduced memory size.Our best results demonstrate significant improvements in both speed and efficiency.For instance,in the case of mBERT,we achieved a 3.87×speedup and 4×compression ratio with a combination of Distil+Prune+Quant that reduced parameters from 178 to 46 M,while the memory size decreased from 711 to 178 MB.These results offer scalable solutions for NLP tasks in various languages and advance the field of model compression,making these models suitable for real-world applications in resource-limited environments. 展开更多
关键词 Bengali NLP black-box distillation emotion classification model compression post-training quantization unstructured pruning
在线阅读 下载PDF
Relativistic two-fluid hydrodynamics with quantized vorticity from the nonlinear Klein-Gordon equation
12
作者 Chi Xiong Kerson Huang 《Communications in Theoretical Physics》 2025年第2期159-169,共11页
We consider a relativistic two-fluid model of superfluidity,in which the superfluid is described by an order parameter that is a complex scalar field satisfying the nonlinear Klein-Gordon equation(NLKG).The coupling t... We consider a relativistic two-fluid model of superfluidity,in which the superfluid is described by an order parameter that is a complex scalar field satisfying the nonlinear Klein-Gordon equation(NLKG).The coupling to the normal fluid is introduced via a covariant current-current interaction,which results in the addition of an effective potential,whose imaginary part describes particle transfer between superfluid and normal fluid.Quantized vorticity arises in a class of singular solutions and the related vortex dynamics is incorporated in the modified NLKG,facilitating numerical analysis which is usually very complicated in the phenomenology of vortex filaments.The dual transformation to a string theory description(Kalb-Ramond)of quantum vorticity,the Magnus force,and the mutual friction between quantized vortices and normal fluid are also studied. 展开更多
关键词 relativistic superfluidity nonlinear Klein-Gordon field theory quantized vortices two-fluid model Kalb-Ramond field global string
原文传递
基于FPGA的灵活量化卷积器
13
作者 金华 蔡新颖 +2 位作者 刘玖金 宋雪桦 王昌达 《微电子学与计算机》 2026年第2期57-71,共15页
针对在资源受限的边缘设备上部署CNN时面临的精度损失、计算吞吐量及卷积运算效率问题,提出了一种灵活的量化卷积器。该方法通过HA-MPLF量化策略,将BN(Batch Normalization)层折叠至卷积层,并为每层过滤器分配最优精度,以在精度和计算... 针对在资源受限的边缘设备上部署CNN时面临的精度损失、计算吞吐量及卷积运算效率问题,提出了一种灵活的量化卷积器。该方法通过HA-MPLF量化策略,将BN(Batch Normalization)层折叠至卷积层,并为每层过滤器分配最优精度,以在精度和计算性能间取得平衡。同时,提出一种基于卷积分解的计算方法,有效支持不同大小过滤器的处理。在FPGA(Field-Programmable Gate Array)平台上,该量化卷积器采用通道优先的运算策略,结合DSP(Digital Signal Processor)打包和级联技术,显著提升资源利用效率。在ZCU102 FPGA上进行实验验证,结果表明:该方法在MobileNet-V2、ResNet18和ResNet50上的精度分别达到90.13%、89.51%和93.33%,并实现了吞吐量的显著提升,为边缘设备上的CNN部署提供了一种高效解决方案。 展开更多
关键词 卷积神经网络 边缘设备 模型量化 卷积运算
在线阅读 下载PDF
自适应动态选择尺度的ViT后训练量化模型研究
14
作者 裴颂文 彭宇昂 +2 位作者 刘方鑫 陈铭松 张波 《小型微型计算机系统》 北大核心 2026年第1期142-149,共8页
后训练量化方法无需重新训练神经网络,且对数据集的依赖性小,是一种轻量且实用的模型压缩技术.然而,现有的量化范式未能有效地拟合post-Softmax激活的分布特性,并且在重新参数化post-LayerNorm激活后,精度不可避免地出现下降.因此,本文... 后训练量化方法无需重新训练神经网络,且对数据集的依赖性小,是一种轻量且实用的模型压缩技术.然而,现有的量化范式未能有效地拟合post-Softmax激活的分布特性,并且在重新参数化post-LayerNorm激活后,精度不可避免地出现下降.因此,本文提出了一种自适应动态选择量化尺度的变换器后训练量化框架DAQ-ViT.DAQ-ViT首先提出了一种基于偏度度量的缩放因子分布选择器,解决了post-LayerNorm激活存在显著的通道间变化所导致的精度下降问题.其次,针对post-Softmax和post-GELU激活分布特性,提出了满足分布特性的Sigmoid量化器.此外,提出了感知分布检测器,自适应感知激活值分布情况,从而动态选择Sigmoid量化和log2量化.实验结果表明,在没有输出重建的情况下与PTQ4ViT相比,DAQ-ViT进行4比特量化时,在DeiT-Tiny和DeiT-Small上的精度分别提高了20%和35%. 展开更多
关键词 模型压缩 模型量化 后训练量化 图像分类 视觉变换器
在线阅读 下载PDF
大语言模型低比特量化的历史、现状和未来:以复域量化为例
15
作者 丛培壮 王飞宇 +3 位作者 王国安 王砚舒 郑策 杨仝 《计算机研究与发展》 北大核心 2026年第2期276-293,共18页
随着大语言模型(large language model,LLM)参数规模的指数级增长,模型部署和推理面临着严峻的内存和计算资源挑战。量化技术作为模型压缩的核心方法,通过降低权重和激活值的数值精度,显著减少了模型的存储需求和计算开销。首先回顾了... 随着大语言模型(large language model,LLM)参数规模的指数级增长,模型部署和推理面临着严峻的内存和计算资源挑战。量化技术作为模型压缩的核心方法,通过降低权重和激活值的数值精度,显著减少了模型的存储需求和计算开销。首先回顾了量化技术的发展历程,从经典的Int8/4量化方法到前沿的超低比特量化算法,总结了典型方法的技术特征与性能演进规律,指出传统实数域量化在极低比特条件下存在受限于离散化误差的挑战,难以突破性能上限。为此,进而系统性地梳理了复域量化系列工作。该系列工作提出了基于复数域的量化范式,通过在参数表示中引入幅度与相位2个自由度,显著扩展了模型的表达空间;此外,类比信号处理中通过将时域信号进行傅里叶变换与低通滤波实现稳定表示的经典范式,进一步提出了由实数模型经复域变换与复域量化,达成了无乘法稳定推理的技术路线。实验结果表明,该方案在多个基准数据集上优于现有超低比特量化方法,有效突破了实数域模型的性能天花板,展现出复域量化在高效建模与性能保持方面的潜在价值。总体而言,通过对量化技术演进及复域量化系列研究的系统分析,旨在揭示超低比特量化的发展规律与未来趋势,为高效大模型的理论研究与工程实现提供参考。 展开更多
关键词 大语言模型 模型量化 低比特量化 模型压缩 复数模型
在线阅读 下载PDF
一种基于内存对齐的大模型混合精度量化方法
16
作者 李章明 关伟凡 +2 位作者 常政威 张凌浩 胡庆浩 《图学学报》 北大核心 2026年第1期39-46,共8页
随着大模型规模的不断增长,模型推理的内存占用和计算开销成为重要挑战。模型量化是降低模型资源消耗的有效方法,但现有方法在权重量化过程中存在离群点处理不足、量化精度损失显著以及内存访问效率低下等问题。为此,提出一种内存对齐... 随着大模型规模的不断增长,模型推理的内存占用和计算开销成为重要挑战。模型量化是降低模型资源消耗的有效方法,但现有方法在权重量化过程中存在离群点处理不足、量化精度损失显著以及内存访问效率低下等问题。为此,提出一种内存对齐的大模型混合精度量化方法,通过将模型参数表示成不同位宽的量化参数实现混合精度量化方法,在降低模型存储的同时缓解量化带来的精度损失问题。具体来说,基于小组显著性分析划分权重离群点,将模型参数按单指令多数据流(SIMD)单元对齐分组,并依据显著性对不同小组采用8 bit或2 bit量化;针对2 bit量化可能导致的精度损失,引入分块量化补偿策略。此外,设计了一种高效的混合精度权重打包与存储方案,通过位图(Bitmap)记录数据块位宽类型,支持随机访问。实验结果表明,该方法在保证模型精度的同时,显著降低了内存占用并提升了计算效率。通过在Llama2-7 B,13 B和70 B上进行验证,相比最先进的方法,在WikiText2和C4数据集上的困惑度(PPL)分别下降8.13,2.84,1.37及5.80,并且量化后的70 B模型相对BF16权重存储约减87%。此外在7个QA数据集上平均准确率提升6.24%。其结果表明,基于内存对齐的大模型混合精度量化方法能够同时提升压缩率、访存效率与模型性能。 展开更多
关键词 大模型压缩 训练后量化 低比特量化 混合精度量化 离群点划分
在线阅读 下载PDF
面向可重构结构的CNN模型混合压缩方法
17
作者 刘朋飞 蒋林 +1 位作者 李远成 吴海 《现代电子技术》 北大核心 2026年第1期167-173,共7页
随着卷积神经网络规模的不断扩大,其参数量和计算量显著增加,导致硬件面临严重的访存瓶颈,限制了计算效率。为解决这一问题,文中提出一种面向可重构结构的CNN混合压缩新方法,该方法采用先剪枝后量化的策略,通过基于一阶泰勒展开的滤波... 随着卷积神经网络规模的不断扩大,其参数量和计算量显著增加,导致硬件面临严重的访存瓶颈,限制了计算效率。为解决这一问题,文中提出一种面向可重构结构的CNN混合压缩新方法,该方法采用先剪枝后量化的策略,通过基于一阶泰勒展开的滤波器剪枝、基于阈值的全连接层权值剪枝和混合精度自适应量化策略,来减少模型参数量和计算复杂度,并部署在自研的可重构处理器上。实验结果表明,所提方法在VGG16和ResNet18模型上分别实现了31.4倍和7.9倍的压缩比,精度仅下降1.20%和0.74%。在基于VirtexUltraScale VU440 FPGA开发板搭建的可重构阵列处理器上,压缩后的VGG16模型执行周期最大降低了62.7%。证明所提方法适合资源有限的边缘计算设备。 展开更多
关键词 卷积神经网络 模型压缩 结构化剪枝 自适应量化 并行计算 可重构结构
在线阅读 下载PDF
大语言模型混合量化压缩与加速推理技术
18
作者 尹经纬 李志强 刘裕彤 《计算机工程与设计》 北大核心 2026年第1期187-194,共8页
大语言模型已广泛应用于日常学习、工作和生活中,但由于其参数规模庞大、资源消耗高,且推理高度依赖GPU,这严重制约其推广。针对上述问题,论文在CPU环境下提出基于离群特征优化的混合INT8量化方法,充分发挥其在模型压缩中的优势;同时,... 大语言模型已广泛应用于日常学习、工作和生活中,但由于其参数规模庞大、资源消耗高,且推理高度依赖GPU,这严重制约其推广。针对上述问题,论文在CPU环境下提出基于离群特征优化的混合INT8量化方法,充分发挥其在模型压缩中的优势;同时,基于注意力机制在文本首尾集中分布的规律,设计高效的参数快速读取机制。两种方法的有机结合显著减少模型内存消耗和提升推理效率,为解决大语言模型在边缘计算环境中的应用瓶颈提供新的技术方案。在I7-13700 CPU环境下,基于LLaMA2、GPT-J和FSEQ大模型,使用C4、Wikitext和PG19数据集进行全面验证,结果充分验证了所提方法的优越性与实用价值。 展开更多
关键词 大语言模型 离群参数 混合量化 注意力机制 参数快速读取 模型推理 边缘计算
在线阅读 下载PDF
一种自注意力模块的低精度损失量化方法
19
作者 林德铝 何琨 《计算机研究与发展》 北大核心 2026年第1期162-175,共14页
随着深度学习技术的飞速进步和对海量数据集的持续发掘,自注意力模块在自然语言处理、计算机视觉以及大语言模型等多个领域得到了广泛应用。尽管自注意力模块显著提升了深度学习模型的检测精度,其巨大的计算需求却使得其在算力受限的计... 随着深度学习技术的飞速进步和对海量数据集的持续发掘,自注意力模块在自然语言处理、计算机视觉以及大语言模型等多个领域得到了广泛应用。尽管自注意力模块显著提升了深度学习模型的检测精度,其巨大的计算需求却使得其在算力受限的计算设备上部署显得尤为困难。整数量化作为在低算力计算芯片中部署模型的关键技术之一,面临着由自注意力模块结构特点引起的较高精度损失问题。针对这个问题,对自注意力模块的整数量化误差进行了深入分析,提出了伪softmax向量量化方法和分块伪softmax向量量化方法。所提出方法通过对自注意力模块中的softmax向量进行特殊的整数量化,旨在显著提升推理速度的同时,有效降低整数量化带来的误差。实验结果表明,相比于传统的直接量化方法,伪softmax向量量化方法能够将量化精度损失降低50%,而分块伪softmax向量量化方法更是能将精度损失减少约90%。该结果充分证明了这2种量化方法在减少精度损失方面的有效性,为自注意力模块在算力受限设备上的高效部署提供了有力支持。 展开更多
关键词 模型量化 自注意力模块 低精度损失 推理加速 分治
在线阅读 下载PDF
基于知识蒸馏的量化卷积神经网络模型压缩研究
20
作者 何龙超 武唯康 +1 位作者 李斌 常迎辉 《计算机测量与控制》 2026年第2期227-234,共8页
针对边缘设备部署深度卷积神经网络存在的高资源消耗问题,对知识蒸馏与低比特量化协同优化方法进行了研究;采用了量化感知训练与蒸馏损失联合指导的关键技术,通过教师模型软标签监督和投影梯度下降优化,有效缓解了低比特量化的精度损失;... 针对边缘设备部署深度卷积神经网络存在的高资源消耗问题,对知识蒸馏与低比特量化协同优化方法进行了研究;采用了量化感知训练与蒸馏损失联合指导的关键技术,通过教师模型软标签监督和投影梯度下降优化,有效缓解了低比特量化的精度损失;在CIFAR-10和CIFAR-100数据集上的实验分析与验证,该方法实现了ResNet系列网络的4位量化,在CIFAR-10上达到92.1%的准确率,模型大小压缩至0.41 MB;经FPGA端侧部署验证,ResNet-20推理时延从82.3 ms降至5.67 ms,满足了边缘计算对低延迟与高效率的工程需求;证实该方法能在保持精度的同时显著降低资源开销,为资源受限环境下的神经网络部署提供了有效解决方案。 展开更多
关键词 卷积神经网络 模型压缩 知识蒸馏 量化 FPGA 边缘计算
在线阅读 下载PDF
上一页 1 2 24 下一页 到第
使用帮助 返回顶部