期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
SelectQ:Calibration Data Selection for Post-training Quantization
1
作者 Zhao Zhang Yangcheng Gao +3 位作者 Jicong Fan Zhongqiu Zhao Yi Yang Shuicheng Yan 《Machine Intelligence Research》 2025年第3期499-510,共12页
Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on... Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on full training set.To calibrate a quantized model,current PTQ methods usually randomly select some unlabeled data from the training set as calibration data.However,we show the random data selection would result in performance instability and degradation due to the activation distribution mismatch.In this paper,we attempt to solve the crucial task on appropriate calibration data selection,and propose a novel one-shot calibration data selection method termed SelectQ,which selects specific data for calibration via dynamic clustering.The setting of our SelectQ uses the statistic information of activation and performs layer-wise clustering to learn an activation distribution on training set.For that purpose,a new metric called knowledge distance is proposed to calculate the distances of the activation statistics to centroids.Finally,after calibration with the selected data,quantization noise can be alleviated by mitigating the distribution mismatch within activations.Extensive experiments on ImageNet dataset show that our SelectQ increases the top-1 accuracy of ResNet18 over 15% in 4-bit quantization,compared to randomly sampled calibration data.It's noteworthy that SelectQ does not involve both the backward propagation and batch normalization parameters,which means that it has fewer limitations in practical applications. 展开更多
关键词 model compression low-bit model quantization less performance loss one-shot dynamic clustering calibration data selection
原文传递
BER performance analysis of non-Hermitian symmetry OFDM-VLC systems with ADC quantization noise
2
作者 WANG Zhongpeng AI Caihua ZHANG Lijuan 《Optoelectronics Letters》 2025年第11期677-683,共7页
Quantization noise caused by analog-to-digital converter(ADC)gives rise to the reliability performance degradation of communication systems.In this paper,a quantized non-Hermitian symmetry(NHS)orthogonal frequency-div... Quantization noise caused by analog-to-digital converter(ADC)gives rise to the reliability performance degradation of communication systems.In this paper,a quantized non-Hermitian symmetry(NHS)orthogonal frequency-division multiplexing-based visible light communication(OFDM-VLC)system is presented.In order to analyze the effect of the resolution of ADC on NHS OFDM-VLC,a quantized mathematical model of NHS OFDM-VLC is established.Based on the proposed quantized model,a closed-form bit error rate(BER)expression is derived.The theoretical analysis and simulation results both confirm the effectiveness of the obtained BER formula in high-resolution ADC.In addition,channel coding is helpful in compensating for the BER performance loss due to the utilization of lower resolution ADC. 展开更多
关键词 quantized modela communication systemsin Bit Error Rate quantized mathematical model reliability performance degradation non hermitian symmetry ADC quantization OFDM VLC
原文传递
An empirical study of LLaMA3 quantization:from LLMs to MLLMs 被引量:2
3
作者 Wei Huang Xingyu Zheng +7 位作者 Xudong Ma Haotong Qin Chengtao Lv Hong Chen Jie Luo Xiaojuan Qi Xianglong Liu Michele Magno 《Visual Intelligence》 2024年第1期457-469,共13页
The LLaMA family,a collection of foundation language models ranging from 7B to 65B parameters,has become one of the most powerful open-source large language models(LLMs)and the popular LLM backbone of multi-modal larg... The LLaMA family,a collection of foundation language models ranging from 7B to 65B parameters,has become one of the most powerful open-source large language models(LLMs)and the popular LLM backbone of multi-modal large language models(MLLMs),widely used in computer vision and natural language understanding tasks.In particular,LLaMA3 models have recently been released and have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data.Given the wide application of low-bit quantization for LLMs in resource-constrained scenarios,we explore LLaMA3’s capabilities when quantized to low bit-width.This exploration can potentially provide new insights and challenges for the low-bit quantization of LLaMA3 and other future LLMs,especially in addressing performance degradation issues that suffer in LLM compression.Specifically,we comprehensively evaluate the 10 existing post-training quantization and LoRA fine-tuning(LoRA-FT)methods of LLaMA3 on 1-8 bits and various datasets to reveal the low-bit quantization performance of LLaMA3.To uncover the capabilities of low-bit quantized MLLM,we assessed the performance of the LLaMA3-based LLaVA-Next-8B model under 2-4 ultra-low bits with post-training quantization methods.Our experimental results indicate that LLaMA3 still suffers from non-negligible degradation in linguistic and visual contexts,particularly under ultra-low bit widths.This highlights the significant performance gap at low bit-width that needs to be addressed in future developments.We expect that this empirical study will prove valuable in advancing future models,driving LLMs and MLLMs to achieve higher accuracy at lower bit to enhance practicality. 展开更多
关键词 model quantization Large language model MULTI-MODAL Deep learning
在线阅读 下载PDF
An Efficient Multiplier-Less Processing Element on Power-of-2 Dictionary-Based Data Quantization
4
作者 JIAXIANG LI MASAO YANAGISAWA YOUHUA SHI 《Integrated Circuits and Systems》 2024年第1期53-62,共10页
The large-scale neural networks have brought incredible shocks to the world,changing people’s lives and offering vast prospects.However,they also come with enormous demands for computational power and storage pressur... The large-scale neural networks have brought incredible shocks to the world,changing people’s lives and offering vast prospects.However,they also come with enormous demands for computational power and storage pressure,the core of its computational requirements lies in the matrix multiplication units dominated by multiplication operations.To address this issue,we propose an area-power-efficient multiplier-less processing element(PE)design.Prior to implementing the proposed PE,we apply a powerof-2 dictionary-based quantization to the model and effectiveness of this quantization method in preserving the accuracy of the original model is confirmed.In hardware design,we present a standard and one variant‘bi-sign’architecture of the PE.Our evaluation results demonstrate that the systolic array that implement our standard multiplier-less PE achieves approximately 38%lower power-delay-product and 13%smaller core area compared to a conventional multiplication-and-accumulation PE and the bi-sign PE design can even save 37%core area and 38%computation energy.Furthermore,the applied quantization reduces the model size and operand bit-width,leading to decreased on-chip memory usage and energy consumption for memory accesses.Additionally,the hardware schematic facilitates expansion to support other sparsity-aware,energy-efficient techniques. 展开更多
关键词 AI accelerators approximate computing efficient-computing model quantization multiplier-less processing element
在线阅读 下载PDF
MobileNet network optimization based on convolutional block attention module 被引量:3
5
作者 ZHAO Shuxu MEN Shiyao YUAN Lin 《Journal of Measurement Science and Instrumentation》 CAS CSCD 2022年第2期225-234,共10页
Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and com... Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and complex model structures require more calculating resources.Since people generally can only carry and use mobile and portable devices in application scenarios,neural networks have limitations in terms of calculating resources,size and power consumption.Therefore,the efficient lightweight model MobileNet is used as the basic network in this study for optimization.First,the accuracy of the MobileNet model is improved by adding methods such as the convolutional block attention module(CBAM)and expansion convolution.Then,the MobileNet model is compressed by using pruning and weight quantization algorithms based on weight size.Afterwards,methods such as Python crawlers and data augmentation are employed to create a garbage classification data set.Based on the above model optimization strategy,the garbage classification mobile terminal application is deployed on mobile phones and raspberry pies,realizing completing the garbage classification task more conveniently. 展开更多
关键词 MobileNet convolutional block attention module(CBAM) model pruning and quantization edge machine learning
在线阅读 下载PDF
Possible magnetic structures of EuZrO_3
6
作者 胡爱元 秦国平 +1 位作者 毋志民 崔玉亭 《Chinese Physics B》 SCIE EI CAS CSCD 2015年第6期545-550,共6页
A comprehensive research of the antiferromagnetic (AFM) structures of perovskite-type EuZrO3 is carried out by use of the double-time Green's function. Two possible types of AFM configurations are considered, and t... A comprehensive research of the antiferromagnetic (AFM) structures of perovskite-type EuZrO3 is carried out by use of the double-time Green's function. Two possible types of AFM configurations are considered, and theoretical results are compared with experimental results to extract the values of parameters J1, J2, and D. The obtained exchanges are employed to calculate the magnetic susceptibility, which is then in turn compared with the experimental one. Therefore, we think that the magnetic structure of EuZrO3 may be an isotropic G-type structure or an anisotropic A-type structure. 展开更多
关键词 quantized spin model quantum phase transition europium titanate
原文传递
Quantum Ising Criticality in Dimerized XY Spin—1/2Chain
7
作者 YEFei 《Communications in Theoretical Physics》 SCIE CAS CSCD 2003年第4期487-492,共6页
In the present paper we propose a spin-1/2 chain which provides an exactly solvable example to studythe Ising criticality with the central charge c = 1/2. We performthe diagonalization of this model in the presence of... In the present paper we propose a spin-1/2 chain which provides an exactly solvable example to studythe Ising criticality with the central charge c = 1/2. We performthe diagonalization of this model in the presence ofmagnetic field. From the full energy spectrum, the central charge and the scaling dimensions are given at the criticalpoint. The results show evidently that the quantum Ising criticality exists in such a system. 展开更多
关键词 quantized spin models Ising criticality
在线阅读 下载PDF
Ising Transition in Dimerized XY Quantum Spin Chain
8
作者 YEFei DINGGuo-Hui 等 《Communications in Theoretical Physics》 SCIE CAS CSCD 2002年第4期492-494,共3页
We proposed a simple spin-1/2 model which provides an exactly solvable example to study the Ising criticality with central charge . By mapping it onto the real Majorana fermions, the Ising critical behavior is explore... We proposed a simple spin-1/2 model which provides an exactly solvable example to study the Ising criticality with central charge . By mapping it onto the real Majorana fermions, the Ising critical behavior is explored explicitly, although its bosonized form is not the double frequency sine-Gordon model. 展开更多
关键词 quantized spin models Ising criticality
在线阅读 下载PDF
RobustMQ: benchmarking robustness of quantized models
9
作者 Yisong Xiao Aishan Liu +3 位作者 Tianyuan Zhang Haotong Qin Jinyang Guo Xianglong Liu 《Visual Intelligence》 2023年第1期13-27,共15页
Quantization has emerged as an essential technique for deploying deep neural networks(DNNs)on devices with limited resources.However,quantized models exhibit vulnerabilities when exposed to various types of noise in r... Quantization has emerged as an essential technique for deploying deep neural networks(DNNs)on devices with limited resources.However,quantized models exhibit vulnerabilities when exposed to various types of noise in real-world applications.Despite the importance of evaluating the impact of quantization on robustness,existing research on this topic is limited and often disregards established principles of robustness evaluation,resulting in incomplete and inconclusivefindings.To address this gap,we thoroughly evaluated the robustness of quantized models against various types of noise(adversarial attacks,natural corruption,and systematic noise)on ImageNet.The comprehensive evaluation results empirically provide valuable insights into the robustness of quantized models in various scenarios.For example:1)quantized models exhibit higher adversarial robustness than theirfloating-point counterparts,but are more vulnerable to natural corruption and systematic noise;2)in general,increasing the quantization bit-width results in a decrease in adversarial robustness,an increase in natural robustness,and an increase in systematic robustness;3)among corruption methods,impulse noise and glass blur are the most harmful to quantized models,while brightness has the least impact;4)among different types of systematic noise,the nearest neighbor interpolation has the highest impact,while bilinear interpolation,cubic interpolation,and area interpolation are the three least harmful.Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios. 展开更多
关键词 model quantization model robustness Robustness benchmark Computer vision
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部