期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
IG-3D:Integrated-Gradients 3D Optimization for Private Transformer Inference
1
作者 Lei Sun Jingwen Wang +3 位作者 Peng Hu Xiuqing Mao Cuiyun Hu Zhihong Wang 《Computers, Materials & Continua》 2026年第5期1158-1176,共19页
Transformer models face significant computational challenges in private inference(PI).Existing optimization methods often rely on isolated techniques,neglecting joint structural and operational improvements.We propose... Transformer models face significant computational challenges in private inference(PI).Existing optimization methods often rely on isolated techniques,neglecting joint structural and operational improvements.We propose IG-3D,a unified framework that integrates structured compression and operator approximation through accurate importance assessment.Our approach first evaluates attention head importance using Integrated Gradients(IG),offering greater stability and theoretical soundness than gradient-based methods.We then apply a threedimensional optimization:(1)structurally pruning redundant attention heads;(2)replacing Softmax with adaptive polynomial approximation to avoid exponential computations;(3)implementing layer-wise GELU substitution to accommodate different layer characteristics.A joint thresholdmechanism coordinates compression across dimensions under accuracy constraints.Experimental results on the GLUE benchmark show that our method achieves an average 2.9×speedup in inference latency and a 50%reduction in communication cost,while controlling the accuracy loss within 2.3%,demonstrating significant synergistic effects and a superior accuracy-efficiency trade-off compared to single-technique optimization strategies. 展开更多
关键词 Private inference TRANSFORMER attention-head pruning integrated gradients transformer model optimization
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部