Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on...Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on full training set.To calibrate a quantized model,current PTQ methods usually randomly select some unlabeled data from the training set as calibration data.However,we show the random data selection would result in performance instability and degradation due to the activation distribution mismatch.In this paper,we attempt to solve the crucial task on appropriate calibration data selection,and propose a novel one-shot calibration data selection method termed SelectQ,which selects specific data for calibration via dynamic clustering.The setting of our SelectQ uses the statistic information of activation and performs layer-wise clustering to learn an activation distribution on training set.For that purpose,a new metric called knowledge distance is proposed to calculate the distances of the activation statistics to centroids.Finally,after calibration with the selected data,quantization noise can be alleviated by mitigating the distribution mismatch within activations.Extensive experiments on ImageNet dataset show that our SelectQ increases the top-1 accuracy of ResNet18 over 15% in 4-bit quantization,compared to randomly sampled calibration data.It's noteworthy that SelectQ does not involve both the backward propagation and batch normalization parameters,which means that it has fewer limitations in practical applications.展开更多
Quantization noise caused by analog-to-digital converter(ADC)gives rise to the reliability performance degradation of communication systems.In this paper,a quantized non-Hermitian symmetry(NHS)orthogonal frequency-div...Quantization noise caused by analog-to-digital converter(ADC)gives rise to the reliability performance degradation of communication systems.In this paper,a quantized non-Hermitian symmetry(NHS)orthogonal frequency-division multiplexing-based visible light communication(OFDM-VLC)system is presented.In order to analyze the effect of the resolution of ADC on NHS OFDM-VLC,a quantized mathematical model of NHS OFDM-VLC is established.Based on the proposed quantized model,a closed-form bit error rate(BER)expression is derived.The theoretical analysis and simulation results both confirm the effectiveness of the obtained BER formula in high-resolution ADC.In addition,channel coding is helpful in compensating for the BER performance loss due to the utilization of lower resolution ADC.展开更多
The LLaMA family,a collection of foundation language models ranging from 7B to 65B parameters,has become one of the most powerful open-source large language models(LLMs)and the popular LLM backbone of multi-modal larg...The LLaMA family,a collection of foundation language models ranging from 7B to 65B parameters,has become one of the most powerful open-source large language models(LLMs)and the popular LLM backbone of multi-modal large language models(MLLMs),widely used in computer vision and natural language understanding tasks.In particular,LLaMA3 models have recently been released and have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data.Given the wide application of low-bit quantization for LLMs in resource-constrained scenarios,we explore LLaMA3’s capabilities when quantized to low bit-width.This exploration can potentially provide new insights and challenges for the low-bit quantization of LLaMA3 and other future LLMs,especially in addressing performance degradation issues that suffer in LLM compression.Specifically,we comprehensively evaluate the 10 existing post-training quantization and LoRA fine-tuning(LoRA-FT)methods of LLaMA3 on 1-8 bits and various datasets to reveal the low-bit quantization performance of LLaMA3.To uncover the capabilities of low-bit quantized MLLM,we assessed the performance of the LLaMA3-based LLaVA-Next-8B model under 2-4 ultra-low bits with post-training quantization methods.Our experimental results indicate that LLaMA3 still suffers from non-negligible degradation in linguistic and visual contexts,particularly under ultra-low bit widths.This highlights the significant performance gap at low bit-width that needs to be addressed in future developments.We expect that this empirical study will prove valuable in advancing future models,driving LLMs and MLLMs to achieve higher accuracy at lower bit to enhance practicality.展开更多
The large-scale neural networks have brought incredible shocks to the world,changing people’s lives and offering vast prospects.However,they also come with enormous demands for computational power and storage pressur...The large-scale neural networks have brought incredible shocks to the world,changing people’s lives and offering vast prospects.However,they also come with enormous demands for computational power and storage pressure,the core of its computational requirements lies in the matrix multiplication units dominated by multiplication operations.To address this issue,we propose an area-power-efficient multiplier-less processing element(PE)design.Prior to implementing the proposed PE,we apply a powerof-2 dictionary-based quantization to the model and effectiveness of this quantization method in preserving the accuracy of the original model is confirmed.In hardware design,we present a standard and one variant‘bi-sign’architecture of the PE.Our evaluation results demonstrate that the systolic array that implement our standard multiplier-less PE achieves approximately 38%lower power-delay-product and 13%smaller core area compared to a conventional multiplication-and-accumulation PE and the bi-sign PE design can even save 37%core area and 38%computation energy.Furthermore,the applied quantization reduces the model size and operand bit-width,leading to decreased on-chip memory usage and energy consumption for memory accesses.Additionally,the hardware schematic facilitates expansion to support other sparsity-aware,energy-efficient techniques.展开更多
Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and com...Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and complex model structures require more calculating resources.Since people generally can only carry and use mobile and portable devices in application scenarios,neural networks have limitations in terms of calculating resources,size and power consumption.Therefore,the efficient lightweight model MobileNet is used as the basic network in this study for optimization.First,the accuracy of the MobileNet model is improved by adding methods such as the convolutional block attention module(CBAM)and expansion convolution.Then,the MobileNet model is compressed by using pruning and weight quantization algorithms based on weight size.Afterwards,methods such as Python crawlers and data augmentation are employed to create a garbage classification data set.Based on the above model optimization strategy,the garbage classification mobile terminal application is deployed on mobile phones and raspberry pies,realizing completing the garbage classification task more conveniently.展开更多
A comprehensive research of the antiferromagnetic (AFM) structures of perovskite-type EuZrO3 is carried out by use of the double-time Green's function. Two possible types of AFM configurations are considered, and t...A comprehensive research of the antiferromagnetic (AFM) structures of perovskite-type EuZrO3 is carried out by use of the double-time Green's function. Two possible types of AFM configurations are considered, and theoretical results are compared with experimental results to extract the values of parameters J1, J2, and D. The obtained exchanges are employed to calculate the magnetic susceptibility, which is then in turn compared with the experimental one. Therefore, we think that the magnetic structure of EuZrO3 may be an isotropic G-type structure or an anisotropic A-type structure.展开更多
In the present paper we propose a spin-1/2 chain which provides an exactly solvable example to studythe Ising criticality with the central charge c = 1/2. We performthe diagonalization of this model in the presence of...In the present paper we propose a spin-1/2 chain which provides an exactly solvable example to studythe Ising criticality with the central charge c = 1/2. We performthe diagonalization of this model in the presence ofmagnetic field. From the full energy spectrum, the central charge and the scaling dimensions are given at the criticalpoint. The results show evidently that the quantum Ising criticality exists in such a system.展开更多
We proposed a simple spin-1/2 model which provides an exactly solvable example to study the Ising criticality with central charge . By mapping it onto the real Majorana fermions, the Ising critical behavior is explore...We proposed a simple spin-1/2 model which provides an exactly solvable example to study the Ising criticality with central charge . By mapping it onto the real Majorana fermions, the Ising critical behavior is explored explicitly, although its bosonized form is not the double frequency sine-Gordon model.展开更多
Quantization has emerged as an essential technique for deploying deep neural networks(DNNs)on devices with limited resources.However,quantized models exhibit vulnerabilities when exposed to various types of noise in r...Quantization has emerged as an essential technique for deploying deep neural networks(DNNs)on devices with limited resources.However,quantized models exhibit vulnerabilities when exposed to various types of noise in real-world applications.Despite the importance of evaluating the impact of quantization on robustness,existing research on this topic is limited and often disregards established principles of robustness evaluation,resulting in incomplete and inconclusivefindings.To address this gap,we thoroughly evaluated the robustness of quantized models against various types of noise(adversarial attacks,natural corruption,and systematic noise)on ImageNet.The comprehensive evaluation results empirically provide valuable insights into the robustness of quantized models in various scenarios.For example:1)quantized models exhibit higher adversarial robustness than theirfloating-point counterparts,but are more vulnerable to natural corruption and systematic noise;2)in general,increasing the quantization bit-width results in a decrease in adversarial robustness,an increase in natural robustness,and an increase in systematic robustness;3)among corruption methods,impulse noise and glass blur are the most harmful to quantized models,while brightness has the least impact;4)among different types of systematic noise,the nearest neighbor interpolation has the highest impact,while bilinear interpolation,cubic interpolation,and area interpolation are the three least harmful.Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.展开更多
基金partially supported by the National Natural Science Foundation of China(Nos.62072151,62376236,61932009)Anhui Provincial Natural Science Fund for the Distinguished Young Scholars,China(No.2008085J30)+2 种基金Open Foundation of Yunnan Key Laboratory of Software Engineering,China(No.2023SE103)CCF-Baidu Open Fund,CAAI-Huawei MindSpore Open Fund,Shenzhen Science and Technology Program,China(No.ZDSYS20230626091302006)Key Project of Science and Technology of Guangxi,China(No.AB22035022-2021AB20147).
文摘Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on full training set.To calibrate a quantized model,current PTQ methods usually randomly select some unlabeled data from the training set as calibration data.However,we show the random data selection would result in performance instability and degradation due to the activation distribution mismatch.In this paper,we attempt to solve the crucial task on appropriate calibration data selection,and propose a novel one-shot calibration data selection method termed SelectQ,which selects specific data for calibration via dynamic clustering.The setting of our SelectQ uses the statistic information of activation and performs layer-wise clustering to learn an activation distribution on training set.For that purpose,a new metric called knowledge distance is proposed to calculate the distances of the activation statistics to centroids.Finally,after calibration with the selected data,quantization noise can be alleviated by mitigating the distribution mismatch within activations.Extensive experiments on ImageNet dataset show that our SelectQ increases the top-1 accuracy of ResNet18 over 15% in 4-bit quantization,compared to randomly sampled calibration data.It's noteworthy that SelectQ does not involve both the backward propagation and batch normalization parameters,which means that it has fewer limitations in practical applications.
基金supported by the National Natural Science Foundation of China(No.62201508)the Zhejiang Provincial Natural Science Foundation of China(Nos.LZ21F010001 and LQ23F010004)the State Key Laboratory of Millimeter Waves of Southeast University,China(No.K202212).
文摘Quantization noise caused by analog-to-digital converter(ADC)gives rise to the reliability performance degradation of communication systems.In this paper,a quantized non-Hermitian symmetry(NHS)orthogonal frequency-division multiplexing-based visible light communication(OFDM-VLC)system is presented.In order to analyze the effect of the resolution of ADC on NHS OFDM-VLC,a quantized mathematical model of NHS OFDM-VLC is established.Based on the proposed quantized model,a closed-form bit error rate(BER)expression is derived.The theoretical analysis and simulation results both confirm the effectiveness of the obtained BER formula in high-resolution ADC.In addition,channel coding is helpful in compensating for the BER performance loss due to the utilization of lower resolution ADC.
基金supported by the National Science and Technology Major Project(2021ZD0110503)the Swiss National Science Foundation(SNSF)project 200021E_219943 Neuromorphic Attention Models for Event Data(NAMED)the Baidu Scholarship,and the National Natural Science Foundation of China(Nos.62306025 and 92367204).
文摘The LLaMA family,a collection of foundation language models ranging from 7B to 65B parameters,has become one of the most powerful open-source large language models(LLMs)and the popular LLM backbone of multi-modal large language models(MLLMs),widely used in computer vision and natural language understanding tasks.In particular,LLaMA3 models have recently been released and have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data.Given the wide application of low-bit quantization for LLMs in resource-constrained scenarios,we explore LLaMA3’s capabilities when quantized to low bit-width.This exploration can potentially provide new insights and challenges for the low-bit quantization of LLaMA3 and other future LLMs,especially in addressing performance degradation issues that suffer in LLM compression.Specifically,we comprehensively evaluate the 10 existing post-training quantization and LoRA fine-tuning(LoRA-FT)methods of LLaMA3 on 1-8 bits and various datasets to reveal the low-bit quantization performance of LLaMA3.To uncover the capabilities of low-bit quantized MLLM,we assessed the performance of the LLaMA3-based LLaVA-Next-8B model under 2-4 ultra-low bits with post-training quantization methods.Our experimental results indicate that LLaMA3 still suffers from non-negligible degradation in linguistic and visual contexts,particularly under ultra-low bit widths.This highlights the significant performance gap at low bit-width that needs to be addressed in future developments.We expect that this empirical study will prove valuable in advancing future models,driving LLMs and MLLMs to achieve higher accuracy at lower bit to enhance practicality.
基金supported by theWaseda University Open Innovation Ecosystem Program for Pioneering Research(W-SPRING)under Grant Number JPMJSP2128.
文摘The large-scale neural networks have brought incredible shocks to the world,changing people’s lives and offering vast prospects.However,they also come with enormous demands for computational power and storage pressure,the core of its computational requirements lies in the matrix multiplication units dominated by multiplication operations.To address this issue,we propose an area-power-efficient multiplier-less processing element(PE)design.Prior to implementing the proposed PE,we apply a powerof-2 dictionary-based quantization to the model and effectiveness of this quantization method in preserving the accuracy of the original model is confirmed.In hardware design,we present a standard and one variant‘bi-sign’architecture of the PE.Our evaluation results demonstrate that the systolic array that implement our standard multiplier-less PE achieves approximately 38%lower power-delay-product and 13%smaller core area compared to a conventional multiplication-and-accumulation PE and the bi-sign PE design can even save 37%core area and 38%computation energy.Furthermore,the applied quantization reduces the model size and operand bit-width,leading to decreased on-chip memory usage and energy consumption for memory accesses.Additionally,the hardware schematic facilitates expansion to support other sparsity-aware,energy-efficient techniques.
文摘Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and complex model structures require more calculating resources.Since people generally can only carry and use mobile and portable devices in application scenarios,neural networks have limitations in terms of calculating resources,size and power consumption.Therefore,the efficient lightweight model MobileNet is used as the basic network in this study for optimization.First,the accuracy of the MobileNet model is improved by adding methods such as the convolutional block attention module(CBAM)and expansion convolution.Then,the MobileNet model is compressed by using pruning and weight quantization algorithms based on weight size.Afterwards,methods such as Python crawlers and data augmentation are employed to create a garbage classification data set.Based on the above model optimization strategy,the garbage classification mobile terminal application is deployed on mobile phones and raspberry pies,realizing completing the garbage classification task more conveniently.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11404046,11347217,and 61201119)the Basic Research Foundation of Chongqing Education Committee,China(Grant No.KJ130615)the Chongqing Science&Technology Committee,China(Grant Nos.cstc2014jcyj A50013and cstc2013jj B50001)
文摘A comprehensive research of the antiferromagnetic (AFM) structures of perovskite-type EuZrO3 is carried out by use of the double-time Green's function. Two possible types of AFM configurations are considered, and theoretical results are compared with experimental results to extract the values of parameters J1, J2, and D. The obtained exchanges are employed to calculate the magnetic susceptibility, which is then in turn compared with the experimental one. Therefore, we think that the magnetic structure of EuZrO3 may be an isotropic G-type structure or an anisotropic A-type structure.
文摘In the present paper we propose a spin-1/2 chain which provides an exactly solvable example to studythe Ising criticality with the central charge c = 1/2. We performthe diagonalization of this model in the presence ofmagnetic field. From the full energy spectrum, the central charge and the scaling dimensions are given at the criticalpoint. The results show evidently that the quantum Ising criticality exists in such a system.
基金National Natural Science Foundation of China,the RFDP
文摘We proposed a simple spin-1/2 model which provides an exactly solvable example to study the Ising criticality with central charge . By mapping it onto the real Majorana fermions, the Ising critical behavior is explored explicitly, although its bosonized form is not the double frequency sine-Gordon model.
基金supported by the National Key R&D Program of China(No.2022ZD0116310)the National Natural Science Foundation of China(Nos.62022009 and 62206009)the State Key Laboratory of Software Development Environment.
文摘Quantization has emerged as an essential technique for deploying deep neural networks(DNNs)on devices with limited resources.However,quantized models exhibit vulnerabilities when exposed to various types of noise in real-world applications.Despite the importance of evaluating the impact of quantization on robustness,existing research on this topic is limited and often disregards established principles of robustness evaluation,resulting in incomplete and inconclusivefindings.To address this gap,we thoroughly evaluated the robustness of quantized models against various types of noise(adversarial attacks,natural corruption,and systematic noise)on ImageNet.The comprehensive evaluation results empirically provide valuable insights into the robustness of quantized models in various scenarios.For example:1)quantized models exhibit higher adversarial robustness than theirfloating-point counterparts,but are more vulnerable to natural corruption and systematic noise;2)in general,increasing the quantization bit-width results in a decrease in adversarial robustness,an increase in natural robustness,and an increase in systematic robustness;3)among corruption methods,impulse noise and glass blur are the most harmful to quantized models,while brightness has the least impact;4)among different types of systematic noise,the nearest neighbor interpolation has the highest impact,while bilinear interpolation,cubic interpolation,and area interpolation are the three least harmful.Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.