This paper presents an algorithm named the dependency-aware offloading framework(DeAOff),which is designed to optimize the deployment of Gen-AI decoder models in mobile edge computing(MEC)environments.These models,suc...This paper presents an algorithm named the dependency-aware offloading framework(DeAOff),which is designed to optimize the deployment of Gen-AI decoder models in mobile edge computing(MEC)environments.These models,such as decoders,pose significant challenges due to their interlayer dependencies and high computational demands,especially under edge resource constraints.To address these challenges,we propose a two-phase optimization algorithm that first handles dependencyaware task allocation and subsequently optimizes energy consumption.By modeling the inference process using directed acyclic graphs(DAGs)and applying constraint relaxation techniques,our approach effectively reduces execution latency and energy usage.Experimental results demonstrate that our method achieves a reduction of up to 20%in task completion time and approximately 30%savings in energy consumption compared to traditional methods.These outcomes underscore our solution’s robustness in managing complex sequential dependencies and dynamic MEC conditions,enhancing quality of service.Thus,our work presents a practical and efficient resource optimization strategy for deploying models in resourceconstrained MEC scenarios.展开更多
随着大语言模型(large language model,LLM)参数规模的指数级增长,模型部署和推理面临着严峻的内存和计算资源挑战。量化技术作为模型压缩的核心方法,通过降低权重和激活值的数值精度,显著减少了模型的存储需求和计算开销。首先回顾了...随着大语言模型(large language model,LLM)参数规模的指数级增长,模型部署和推理面临着严峻的内存和计算资源挑战。量化技术作为模型压缩的核心方法,通过降低权重和激活值的数值精度,显著减少了模型的存储需求和计算开销。首先回顾了量化技术的发展历程,从经典的Int8/4量化方法到前沿的超低比特量化算法,总结了典型方法的技术特征与性能演进规律,指出传统实数域量化在极低比特条件下存在受限于离散化误差的挑战,难以突破性能上限。为此,进而系统性地梳理了复域量化系列工作。该系列工作提出了基于复数域的量化范式,通过在参数表示中引入幅度与相位2个自由度,显著扩展了模型的表达空间;此外,类比信号处理中通过将时域信号进行傅里叶变换与低通滤波实现稳定表示的经典范式,进一步提出了由实数模型经复域变换与复域量化,达成了无乘法稳定推理的技术路线。实验结果表明,该方案在多个基准数据集上优于现有超低比特量化方法,有效突破了实数域模型的性能天花板,展现出复域量化在高效建模与性能保持方面的潜在价值。总体而言,通过对量化技术演进及复域量化系列研究的系统分析,旨在揭示超低比特量化的发展规律与未来趋势,为高效大模型的理论研究与工程实现提供参考。展开更多
采用3 mm铜模及3~6 mm铁模在不同冷却速率条件下凝固制备了Al-7Si合金,通过温度测量及组织性能表征分析了冷却速率对合金组织性能的影响。结果表明:采用3 mm Cu模和3~6 mm Fe模浇注时,A1-7Si合金在共晶温度前的平均冷却速率为42.2~96.5...采用3 mm铜模及3~6 mm铁模在不同冷却速率条件下凝固制备了Al-7Si合金,通过温度测量及组织性能表征分析了冷却速率对合金组织性能的影响。结果表明:采用3 mm Cu模和3~6 mm Fe模浇注时,A1-7Si合金在共晶温度前的平均冷却速率为42.2~96.5℃/s。随着冷却速率由42.2℃/s增加至96.5℃/s,A1-7Si合金的晶粒尺寸与二次枝晶臂间距(SDAS)皆减小50%以上;合金中心区域共晶硅相的含量由(19.1±0.3)%下降至(13.5±0.2)%,主要原因在于样品中心区域共晶硅相的偏聚现象由大面积集中分布转变为小面积扩散分布。此外,与3~6 mm铁模浇注时相比,采用3 mm铜模浇注Al-7Si合金的拉伸性能最佳。展开更多
文摘This paper presents an algorithm named the dependency-aware offloading framework(DeAOff),which is designed to optimize the deployment of Gen-AI decoder models in mobile edge computing(MEC)environments.These models,such as decoders,pose significant challenges due to their interlayer dependencies and high computational demands,especially under edge resource constraints.To address these challenges,we propose a two-phase optimization algorithm that first handles dependencyaware task allocation and subsequently optimizes energy consumption.By modeling the inference process using directed acyclic graphs(DAGs)and applying constraint relaxation techniques,our approach effectively reduces execution latency and energy usage.Experimental results demonstrate that our method achieves a reduction of up to 20%in task completion time and approximately 30%savings in energy consumption compared to traditional methods.These outcomes underscore our solution’s robustness in managing complex sequential dependencies and dynamic MEC conditions,enhancing quality of service.Thus,our work presents a practical and efficient resource optimization strategy for deploying models in resourceconstrained MEC scenarios.
文摘随着大语言模型(large language model,LLM)参数规模的指数级增长,模型部署和推理面临着严峻的内存和计算资源挑战。量化技术作为模型压缩的核心方法,通过降低权重和激活值的数值精度,显著减少了模型的存储需求和计算开销。首先回顾了量化技术的发展历程,从经典的Int8/4量化方法到前沿的超低比特量化算法,总结了典型方法的技术特征与性能演进规律,指出传统实数域量化在极低比特条件下存在受限于离散化误差的挑战,难以突破性能上限。为此,进而系统性地梳理了复域量化系列工作。该系列工作提出了基于复数域的量化范式,通过在参数表示中引入幅度与相位2个自由度,显著扩展了模型的表达空间;此外,类比信号处理中通过将时域信号进行傅里叶变换与低通滤波实现稳定表示的经典范式,进一步提出了由实数模型经复域变换与复域量化,达成了无乘法稳定推理的技术路线。实验结果表明,该方案在多个基准数据集上优于现有超低比特量化方法,有效突破了实数域模型的性能天花板,展现出复域量化在高效建模与性能保持方面的潜在价值。总体而言,通过对量化技术演进及复域量化系列研究的系统分析,旨在揭示超低比特量化的发展规律与未来趋势,为高效大模型的理论研究与工程实现提供参考。
文摘采用3 mm铜模及3~6 mm铁模在不同冷却速率条件下凝固制备了Al-7Si合金,通过温度测量及组织性能表征分析了冷却速率对合金组织性能的影响。结果表明:采用3 mm Cu模和3~6 mm Fe模浇注时,A1-7Si合金在共晶温度前的平均冷却速率为42.2~96.5℃/s。随着冷却速率由42.2℃/s增加至96.5℃/s,A1-7Si合金的晶粒尺寸与二次枝晶臂间距(SDAS)皆减小50%以上;合金中心区域共晶硅相的含量由(19.1±0.3)%下降至(13.5±0.2)%,主要原因在于样品中心区域共晶硅相的偏聚现象由大面积集中分布转变为小面积扩散分布。此外,与3~6 mm铁模浇注时相比,采用3 mm铜模浇注Al-7Si合金的拉伸性能最佳。