摘要
在边缘计算环境中,通过微调预训练模型以适配实际场景数据,可以使用户获得个性化模型的能力,而无需上传数据,从而有效保护隐私。然而,微调过程中高昂的内存开销对资源有限的边缘设备构成了重大挑战,同时,量化部署中因量化误差导致的模型精度下降问题也亟待解决。为此,文中提出了一种面向边缘环境的量化感知训练优化方法,该方法结合了层优先度指标引导的稀疏更新方法与分块量化策略。首先,为了减少迁移学习中的内存占用,文中设计了一种基于层优先度指标的稀疏更新方法,该指标综合考虑模型各层的参数量、MAC值和Fisher信息熵,用以选择性地更新模型部分层,从而在显著降低微调内存开销的同时保持模型精度;其次,针对量化部署过程中精度损失的问题,提出了一种分块量化策略,该策略通过将输入数据分割为多个子块并独立执行量化操作,显著减小了因离群值引发的量化误差,从而提升模型在量化后的精度。实验结果表明,文中方法在多个模型中的表现均优于传统方法。与传统微调方法相比,本方法在微调过程中最高可减少61%的内存占用,同时在量化部署后,精度损失最低可控制在0.2%。这充分验证了文中方法在资源受限的边缘设备上应用的有效性与实用性。
In edge computing environments,fine-tuning pre-trained models to adapt to real-world data enables users to achieve personalized models without uploading data,thus ensuring privacy protection.However,the significant memory overhead during fine-tuning poses a major challenge for resource-constrained edge devices,while the quantization errors in deployment often lead to accuracy degradation.To address these issues,this paper proposes a quantization-aware training method designed for edge environments,which combines layer-priority-guided sparse-update and block-wise quantization.First,to reduce memory consumption during transfer learning,a spare-update method is designed based on a layer priority evaluation metric.This metric integrates factors such as parameter size,MAC value,and Fisher information entropy of each layer to selectively update certain layers,thereby significantly reducing fine-tuning memory usage while maintaining model accuracy.Second,to tackle the accuracy loss caused by quantization deployment,a block-wise quantization strategy is introduced.By dividing input data into multiple sub-blocks and performing quantization independently,this strategy effectively mitigates quantization er-rors caused by outliers,thereby enhancing the accuracy of the quantized model.Experimental results demonstrate that the proposed method outperforms traditional approaches across a range of models.Com-pared to previous fine-tuning methods,the proposed approach reduces memory consumption during fine-tuning by up to 61%and minimizes accuracy loss after quantization deployment to as low as 0.2%.These results confirm the effectiveness and practicality of the proposed method for edge devices with limited re-sources.
作者
凌翔宇
江结林
鲍家坤
LING Xiang-yu;JIANG Jie-lin;BAO Jia-kun(School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044,China;Nanjing Institute of InforSuperBahn,Nanjing 211135,China)
出处
《中国电子科学研究院学报》
2025年第5期465-474,共10页
Journal of China Academy of Electronics and Information Technology
基金
国家青年科学基金资助项目(62001236)。
关键词
边缘计算
量化感知训练
层优先度评估指标
稀疏更新
分块量化
edge computing
quantization-aware training
layer-priority evaluation metric
sparse-update
block-wise quantization