基于改进Transformer的多智能体供应链库存管理方法被引量：2

Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer

下载PDF

导出

摘要有效的供应链库存管理对诸如民用飞机和汽车制造等大规模制造业至关重要,它能确保高效的生产运作。通常情况下,主制造商制定年度库存管理计划,并根据实际生产进度,在某些物料接近临界库存水平时与供应商进行联系。但实际生产情况的变化可能会导致年度库存管理计划的改变,因此根据实际生产情况和库存水平对未来物料采购情况进行决策相对更为灵活与高效。近年来,许多研究者关注采用强化学习方法来研究库存管理问题。当前的方法在解决具有多节点多物料模式的民用飞机制造供应链库存管理问题时虽然能够一定程度上提供高效管理,但是带来了较高的复杂度。为解决这一问题,将问题形式化为一个部分可观察马尔可夫决策过程模型,并提出了一种基于改进Transformer的多智能体供应链库存管理方法。该方法基于多智能体强化学习序列决策的本质,将多智能体强化学习问题转化为编码器-解码器架构的序列建模问题,从逻辑上降低算法的复杂度。实验结果表明,相较于现有的基于强化学习的方法,所提方法在保持性能相近的基础上,于复杂度方面约有90%的改善。 Effective supply chain inventory management is crucial for large-scale manufacturing industries such as civil aircraft and automotive manufacturing,as it ensures efficient production operations.Typically,the main-manufacturer formulates an annual inventory management plan and contacts suppliers when certain materials approach critical inventory levels based on the actual production schedule.However,changes in actual production conditions may necessitate alterations to the annual inventory management plan.Therefore,making procurement decisions based on actual production conditions and inventory is relatively more flexible and efficient.In recent years,many researchers have focused on using reinforcement learning methods to study inventory management problems.Current methods can achieve a certain degree of efficient management when solving the inventory management problem in the civil aircraft manufacturing supply chain with a multi-node and multi-material model,but with high complexity.To address this issue,we formalize the problem as a partially observable Markov decision process model and propose a multi-agent supply chain inventory management method based on improved transformer.This method transforms the multi-agent reinforcement learning problem into a sequence modeling problem with an encoder-decoder architecture based on the essence of multi-agent reinforcement learning sequence decision-making,logically reducing the complexity of the algorithm.Experimental results show that compared to existing reinforcement learning-based methods,the proposed method has about 90% improvement in complexity while maintaining similar performance.

作者朴明杰张冬冬卢鹄李汝鹏葛小丽 PIAO Mingjie;ZHANG Dongdong;LU Hu;LI Rupengand GE Xiaoli(College of Electronic and Information Engineering,Tongji University,Shanghai 201804,China;Aviation Manufacturing Technology Research Institute,Shanghai Aircraft Manufacturing Co.,Ltd.,Shanghai 201324,China)

机构地区同济大学电子与信息工程学院上海飞机制造有限公司航空制造技术研究所

出处《计算机科学》北大核心 2025年第S1期186-195,共10页 Computer Science

基金国家重点研发计划课题(2021YFB3301901)。

关键词多智能体强化学习飞机供应链库存管理部分可观察马尔可夫决策过程 TRANSFORMER Multi-agent reinforcement learning Aircraft supply chain inventory management Partially observable Markov decision process Transformer

分类号 TP399 [自动化与计算机技术—计算机应用技术]