为保证机床混流装配车间生产的机床准时交付,提出一种基于改进的深度多智能体强化学习的机床混流装配线调度优化方法,以解决最小延迟生产调度优化模型求解质量低、训练速度缓慢问题,构建以最小延迟时间目标的混流装配线调度优化模型,应...为保证机床混流装配车间生产的机床准时交付,提出一种基于改进的深度多智能体强化学习的机床混流装配线调度优化方法,以解决最小延迟生产调度优化模型求解质量低、训练速度缓慢问题,构建以最小延迟时间目标的混流装配线调度优化模型,应用去中心化分散执行的双重深度Q网络(double deep Q network,DDQN)的智能体来学习生产信息与调度目标的关系。该框架采用集中训练与分散执行的策略,并使用参数共享技术,能处理多智能体强化学习中的非稳态问题。在此基础上,采用递归神经网络来管理可变长度的状态和行动表示,使智能体具有处理任意规模问题的能力。同时引入全局/局部奖励函数,以解决训练过程中的奖励稀疏问题。通过消融实验,确定了最优的参数组合。数值实验结果表明,与标准测试方案相比,本算法在目标达成度方面,平均总延迟工件数较改善前提升了24.1%~32.3%,训练速度提高了8.3%。展开更多
A novel distributed reinforcement learning(DRL)strategy is proposed in this study to coordinate current sharing and voltage restoration in an islanded DC microgrid.Firstly, a reward function considering both equal pro...A novel distributed reinforcement learning(DRL)strategy is proposed in this study to coordinate current sharing and voltage restoration in an islanded DC microgrid.Firstly, a reward function considering both equal proportional current sharing and cooperative voltage restoration is defined for each local agent. The global reward of the whole DC microgrid which is the sum of the local rewards is regarged as the optimization objective for DRL. Secondly,by using the distributed consensus method, the predefined pinning consensus value that will maximize the global reward is obtained. An adaptive updating method is proposed to ensure stability of the above pinning consensus method under uncertain communication. Finally, the proposed DRL is implemented along with the synchronization seeking process of the pinning reward, to maximize the global reward and achieve an optimal solution for a DC microgrid. Simulation studies with a typical DC microgrid demonstrate that the proposed DRL is computationally efficient and able toprovide an optimal solution even when the communication topology changes.展开更多
文摘为保证机床混流装配车间生产的机床准时交付,提出一种基于改进的深度多智能体强化学习的机床混流装配线调度优化方法,以解决最小延迟生产调度优化模型求解质量低、训练速度缓慢问题,构建以最小延迟时间目标的混流装配线调度优化模型,应用去中心化分散执行的双重深度Q网络(double deep Q network,DDQN)的智能体来学习生产信息与调度目标的关系。该框架采用集中训练与分散执行的策略,并使用参数共享技术,能处理多智能体强化学习中的非稳态问题。在此基础上,采用递归神经网络来管理可变长度的状态和行动表示,使智能体具有处理任意规模问题的能力。同时引入全局/局部奖励函数,以解决训练过程中的奖励稀疏问题。通过消融实验,确定了最优的参数组合。数值实验结果表明,与标准测试方案相比,本算法在目标达成度方面,平均总延迟工件数较改善前提升了24.1%~32.3%,训练速度提高了8.3%。
基金supported by National Key Research and Development Program of China(No.2016YFB0900105)
文摘A novel distributed reinforcement learning(DRL)strategy is proposed in this study to coordinate current sharing and voltage restoration in an islanded DC microgrid.Firstly, a reward function considering both equal proportional current sharing and cooperative voltage restoration is defined for each local agent. The global reward of the whole DC microgrid which is the sum of the local rewards is regarged as the optimization objective for DRL. Secondly,by using the distributed consensus method, the predefined pinning consensus value that will maximize the global reward is obtained. An adaptive updating method is proposed to ensure stability of the above pinning consensus method under uncertain communication. Finally, the proposed DRL is implemented along with the synchronization seeking process of the pinning reward, to maximize the global reward and achieve an optimal solution for a DC microgrid. Simulation studies with a typical DC microgrid demonstrate that the proposed DRL is computationally efficient and able toprovide an optimal solution even when the communication topology changes.