期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Dynamic Batch Processing with FlexiDecode Scheduler for Efficient LLM Inference in IIoT
1
作者 Xiaocong Jia Bruce Gu +5 位作者 Jinjun Chen Longxiang Gao Weiguang Pang Guangtong Lv Youyang Qu Lei Cui 《Big Data Mining and Analytics》 2025年第6期1307-1323,共17页
Large Language Models(LLMs)are expanding their applications across various fields,including Industrial Internet of Things(IIoT),where they analyze sensor data,automate diagnostics,and enhance predictive maintenance.LL... Large Language Models(LLMs)are expanding their applications across various fields,including Industrial Internet of Things(IIoT),where they analyze sensor data,automate diagnostics,and enhance predictive maintenance.LLM inference is provided by service providers to users,with each inference request undergoing two phases:prefill and decode.Due to the autoregressive nature of generation,only one token can be produced per iteration,necessitating multiple iterations to complete a request.Typically,batch processing groups multiple requests into a single batch for inference,improving throughput and hardware utilization.However,in service systems,a fixed batch size presents challenges under fluctuating request volumes,particularly in IIoT environments,where data flow can vary significantly.Specifically,during the high-load periods,a fixed batch size may lead to underutilization of resources,while during the low-load periods,it may result in resource wastage.In this paper,we introduce FlexiDecode Scheduler(FDS)to address these challenges by dynamically adjusting the decoding batch size based on system load conditions,improving resource utilization,and reducing wait time during high-load periods.FDS prioritizes prefilling new requests to maximize decoding efficiency and employs a request output length predictor to optimize request scheduling,minimizing End-to-End(E2E)latency.Compared to virtual Large Language Model(vLLM)and Sarathi,our approach achieves a 23%and 16%reduction in E2E latency,improves actual request execution time by 34%and 15%,respectively,and increases computational utilization by 10%. 展开更多
关键词 virtual Large Language Model(vLLM)inference batch scheduling dynamic decoding batches calculating utilization
原文传递
An adaptive agent-based approach for instant delivery order dispatching: Incorporating task buffering and dynamic batching strategies
2
作者 Miaojia Lu Xinyu Yan +1 位作者 Shadi Sharif Azadeh Pengling Wang 《International Journal of Transportation Science and Technology》 2024年第1期137-154,共18页
The volume of instant delivery has witnessed a significant growth in recent years.Given the involvement of numerous heterogeneous stakeholders,instant delivery operations are inherently characterized by dynamics and u... The volume of instant delivery has witnessed a significant growth in recent years.Given the involvement of numerous heterogeneous stakeholders,instant delivery operations are inherently characterized by dynamics and uncertainties.This study introduces two order dispatching strategies,namely task buffering and dynamic batching,as potential solutions to address these challenges.The task buffering strategy aims to optimize the assignment timing of orders to couriers,thereby mitigating demand uncertainties.On the other hand,the dynamic batching strategy focuses on alleviating delivery pressure by assigning orders to couriers based on their residual capacity and extra delivery dis tances.To model the instant delivery problem and evaluate the performances of order dis patching strategies,Adaptive Agent-Based Order Dispatching(ABOD)approach is developed,which combines agent-based modelling,deep reinforcement learning,and the Kuhn-Munkres algorithm.The ABOD effectively captures the system’s uncertainties and heterogeneity,facilitating stakeholders learning in novel scenarios and enabling adap tive task buffering and dynamic batching decision-makings.The efficacy of the ABOD approach is verified through both synthetic and real-world case studies.Experimental results demonstrate that implementing the ABOD approach can lead to a significant increase in customer satisfaction,up to 275.42%,while simultaneously reducing the deliv ery distance by 11.38%compared to baseline policies.Additionally,the ABOD approach exhibits the ability to adaptively adjust buffering times to maintain high levels of customer satisfaction across various demand scenarios.As a result,this approach offers valuable sup port to logistics providers in making informed decisions regarding order dispatching in instant delivery operations. 展开更多
关键词 Instant delivery Task buffering dynamic batching Agent-based modelling Deep reinforcement learning
在线阅读 下载PDF
Fabrication scheduling on a single machine to minimize the weighted sum of product completion time
3
作者 王玉青 孙世杰 《Journal of Shanghai University(English Edition)》 CAS 2007年第2期109-114,共6页
In this paper, a fabrication scheduling problem concerning the production of components at a single manufacturing facility was studied, in which the manufactured components are subsequently assembled into a finite num... In this paper, a fabrication scheduling problem concerning the production of components at a single manufacturing facility was studied, in which the manufactured components are subsequently assembled into a finite number of end products. Each product was assumed to comprise a common component to all jobs and a unique component to itself. Common operations were processed in batches and each batch required a setup time. A product is completed when both its two operations have been processed and are available. The optimality criterion considered was the minimization of weighted flow time. For this scheduling problem, the optimal schedules were described in a weignted shortest processing time first (WSPT) order and two algorithms were constructed corresponding to the batch availability and item availability, respectively. 展开更多
关键词 SCHEDULING PRODUCT weighted flow time weighted shortest processing time first (WSPT) batch processing dynamic programming.
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部