Optimizing the deployment of large language models(LLMs)in edge computing environments is critical for enhancing privacy and computational efficiency.In the path toward efficient wireless LLM inference in edge computi...Optimizing the deployment of large language models(LLMs)in edge computing environments is critical for enhancing privacy and computational efficiency.In the path toward efficient wireless LLM inference in edge computing,this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs.Accordingly,this study introduces a framework taking inspiration from model-based reinforcement learning to determine the optimal splitting point across the edge and user equipment.By incorporating a reward surrogate model,our approach significantly reduces the computational cost of frequent performance evaluations.Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions,providing a robust solution for LLM deployment in decentralized settings.展开更多
基金supported by the National Key Research and Development Program of China(No.2024YFE0200600)the National Natural Science Foundation of China(No.62071425)+3 种基金the Zhejiang Key Research and Development Plan,China(No.2022C01093)the Zhejiang Provincial Natural Science Foundation of China(No.LR23F010005)the National Key Laboratory of Wireless Communications Foundation,China(No.2023KP01601)the Big Data and Intelligent Computing Key Lab of CQUPT,China(No.BDIC-2023-B-001)。
文摘Optimizing the deployment of large language models(LLMs)in edge computing environments is critical for enhancing privacy and computational efficiency.In the path toward efficient wireless LLM inference in edge computing,this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs.Accordingly,this study introduces a framework taking inspiration from model-based reinforcement learning to determine the optimal splitting point across the edge and user equipment.By incorporating a reward surrogate model,our approach significantly reduces the computational cost of frequent performance evaluations.Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions,providing a robust solution for LLM deployment in decentralized settings.