In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others...In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.展开更多
This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the l...This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.展开更多
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi...This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.展开更多
In the real situations of supply chain, there are different parts such as facilities, logistics warehouses and retail stores and they handle common kinds of products. In this research, these situations are focused on ...In the real situations of supply chain, there are different parts such as facilities, logistics warehouses and retail stores and they handle common kinds of products. In this research, these situations are focused on as the background of this research. They deal with the common quantities of their products, but due to their different environments, the optimal production quantity of one part can be unacceptable to another part and it may suffer a heavy loss. To avoid that kind of unacceptable situations, the common production quantities should be acceptable to all parts in one supply chain. Therefore, the motivation of this research is the necessity of the method to find the production quantities that make all decision makers acceptable is needed. However, it is difficult to find the production quantities that make all decision makers acceptable. Moreover, their acceptable ranges do not always have common ranges. In the decision making of car design, there are similar situations to this type of decision making. The performance of a car consists of purposes such as fuel efficiency, size and so on. Improving one purpose makes another worse and the relationship between these purposes is tradeoff. In these cases, Suriawase process is applied. This process consists of negotiations and reviews of the requirements of the purposes. In the step of negotiations, the requirements of the purposes are share among all decision makers and the solution that makes them as satisfied as possible. In the step of reviews of the requirements, they are reviewed based on the result of the negotiation if the result is unacceptable to some of decision makers. Therefore, through the iterations of the two steps, the solution that makes all decision makers satisfied is obtained. However, in the previous research, the effects that one decision maker reviews requirements in Suriawase process are quantified, but the mathematical model to modify the ranges of production quantities of all decision makers simultaneously is not shown. Therefore, in this research, based on Suriawase process, the mathematical model of multi-player multi-objective decision making is proposed. The mathematical model of multi-player multi-objective decision making by using linear physical programming (LPP) and robust optimization (RO) in the previous research is the basis of the methods of this research. LPP is one of the multi-objective optimization methods and RO is used to make the balance of the preference levels among decision makers. In LPP, the preference ranges of all objective functions are needed, so as the hypothesis of this research. In the research referred in this research, the method to control the effect of RO is not shown. If the effect of RO is too big, the average of the preference level becomes worse. The purpose of this research is to reproduce the mathematical model of multi-player multi-objective decision making based on Suriawase process and propose the method to control the effect of RO. In the proposed model, a set of the solutions of the negotiation problem is obtained and it is proved by the result of the numerical experiment. Therefore, the conclusion that the proposed model is available to obtain a set of the solutions of the negotiation problems in supply chain.展开更多
This paper studies the effect of veto right on players’income in multi-player dynamic bargaining game.Based on a basic multi-person dynamic bargaining model generalized by the Rubinstein’s two-person alternating-off...This paper studies the effect of veto right on players’income in multi-player dynamic bargaining game.Based on a basic multi-person dynamic bargaining model generalized by the Rubinstein’s two-person alternating-offer bargaining model,the authors construct a dynamic multi-player bargaining game with veto players by adding a constraint to its negotiation process,which is obtained by studying the influence of exercising the veto right exercised by veto players.The authors emphatically describe the strategic game form of this dynamic bargaining game and study its equilibrium,then we analyze the relationship between the minimum acceptable payoff of the veto players and the equilibrium income.The research shows that veto right may increase the benefits of veto players and decrease the benefits of non-veto players.Veto players will not affect the players’benefits and the form of equilibrium when the minimum acceptable payoff of every veto player is relatively low.When the minimum acceptable payoff of the veto player is high enough,he can only get the minimum acceptable payoff,and his benefit increases as his minimum acceptable payoff increases.In this case,the veto player has intention to obtain more resources by presenting a higher minimum acceptable payoff.展开更多
由3篇论文组成的系列文章为电力市场和电力系统动态交互仿真研究设计了仿真平台DSPMPS(dynamic simulation platform for power market&power system)。作为首篇,梳理电力市场仿真的需求与仿真平台的技术支撑,提出设计目标。基于信...由3篇论文组成的系列文章为电力市场和电力系统动态交互仿真研究设计了仿真平台DSPMPS(dynamic simulation platform for power market&power system)。作为首篇,梳理电力市场仿真的需求与仿真平台的技术支撑,提出设计目标。基于信息获取、知识提取、决策支持3个要素,介绍DSPMPS的功能设计,以作为支撑层和应用层的设计依据。着重解决的问题包括实验经济学方法,跨领域的动态交互仿真,风险的在线定量分析,多目标多控制手段的决策支持。已经开展的研究工作验证了其功能设计的有效性。展开更多
3篇系列文章中的前2篇分别介绍了电力市场与电力系统动态交互仿真平台(dynamicsimulation platform for power market&power system,DSPMPS)的功能设计和支撑层。利用支撑层提供的仿真应用开发接口及工具,按功能设计要求,以软件积...3篇系列文章中的前2篇分别介绍了电力市场与电力系统动态交互仿真平台(dynamicsimulation platform for power market&power system,DSPMPS)的功能设计和支撑层。利用支撑层提供的仿真应用开发接口及工具,按功能设计要求,以软件积木方式搭建反映广义阻塞、支持图形化决策试探的各种可伸缩应用。从物理系统与经济系统动态交互的典型问题中提炼出高使用频率的模型,设计仿真组件库,以提高组件的可重用性。采用功能强大的电力系统稳定性量化分析软件包FASTEST作为物理仿真引擎。利用仿真标记语言编写实验脚本,配置多角色多参与者的各种仿真应用。通过典型的仿真实验过程说明了该平台的设计特点。展开更多
由3篇论文组成的系列文章介绍了电力市场与电力系统动态交互仿真平台(dynamicsimulation platform for power market&power system,DSPMPS)。在第1篇关于功能设计的基础上,文中建立了仿真应用开发接口,以支持在服务器端及客户端扩...由3篇论文组成的系列文章介绍了电力市场与电力系统动态交互仿真平台(dynamicsimulation platform for power market&power system,DSPMPS)。在第1篇关于功能设计的基础上,文中建立了仿真应用开发接口,以支持在服务器端及客户端扩展各种内核组件。利用IT技术,以可视化、网络化的实验形式来建立多角色博弈;以事件驱动和面向服务机制模拟多尺度时间、空间、目标的复杂动态交互;以标准化、积木化的形式实现高效灵活的用户定制及扩展开发;按国际标准创建开放且高效的多核运算架构。下一篇将介绍以软件积木方式,在支撑层上搭建反映广义阻塞、支持图形化决策试探的各种仿真应用。展开更多
基金supported by the Aeronautical Science Foundation of China(20220001057001)an Open Project of the National Key Laboratory of Air-based Information Perception and Fusion(202437)
文摘In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.
基金supported by the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation (USCAST2022-11)Aeronautical Science Foundation of China (20220001057001)。
文摘This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.
文摘This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.
文摘In the real situations of supply chain, there are different parts such as facilities, logistics warehouses and retail stores and they handle common kinds of products. In this research, these situations are focused on as the background of this research. They deal with the common quantities of their products, but due to their different environments, the optimal production quantity of one part can be unacceptable to another part and it may suffer a heavy loss. To avoid that kind of unacceptable situations, the common production quantities should be acceptable to all parts in one supply chain. Therefore, the motivation of this research is the necessity of the method to find the production quantities that make all decision makers acceptable is needed. However, it is difficult to find the production quantities that make all decision makers acceptable. Moreover, their acceptable ranges do not always have common ranges. In the decision making of car design, there are similar situations to this type of decision making. The performance of a car consists of purposes such as fuel efficiency, size and so on. Improving one purpose makes another worse and the relationship between these purposes is tradeoff. In these cases, Suriawase process is applied. This process consists of negotiations and reviews of the requirements of the purposes. In the step of negotiations, the requirements of the purposes are share among all decision makers and the solution that makes them as satisfied as possible. In the step of reviews of the requirements, they are reviewed based on the result of the negotiation if the result is unacceptable to some of decision makers. Therefore, through the iterations of the two steps, the solution that makes all decision makers satisfied is obtained. However, in the previous research, the effects that one decision maker reviews requirements in Suriawase process are quantified, but the mathematical model to modify the ranges of production quantities of all decision makers simultaneously is not shown. Therefore, in this research, based on Suriawase process, the mathematical model of multi-player multi-objective decision making is proposed. The mathematical model of multi-player multi-objective decision making by using linear physical programming (LPP) and robust optimization (RO) in the previous research is the basis of the methods of this research. LPP is one of the multi-objective optimization methods and RO is used to make the balance of the preference levels among decision makers. In LPP, the preference ranges of all objective functions are needed, so as the hypothesis of this research. In the research referred in this research, the method to control the effect of RO is not shown. If the effect of RO is too big, the average of the preference level becomes worse. The purpose of this research is to reproduce the mathematical model of multi-player multi-objective decision making based on Suriawase process and propose the method to control the effect of RO. In the proposed model, a set of the solutions of the negotiation problem is obtained and it is proved by the result of the numerical experiment. Therefore, the conclusion that the proposed model is available to obtain a set of the solutions of the negotiation problems in supply chain.
基金the National Natural Science Foundation of China under Grant No.71871171。
文摘This paper studies the effect of veto right on players’income in multi-player dynamic bargaining game.Based on a basic multi-person dynamic bargaining model generalized by the Rubinstein’s two-person alternating-offer bargaining model,the authors construct a dynamic multi-player bargaining game with veto players by adding a constraint to its negotiation process,which is obtained by studying the influence of exercising the veto right exercised by veto players.The authors emphatically describe the strategic game form of this dynamic bargaining game and study its equilibrium,then we analyze the relationship between the minimum acceptable payoff of the veto players and the equilibrium income.The research shows that veto right may increase the benefits of veto players and decrease the benefits of non-veto players.Veto players will not affect the players’benefits and the form of equilibrium when the minimum acceptable payoff of every veto player is relatively low.When the minimum acceptable payoff of the veto player is high enough,he can only get the minimum acceptable payoff,and his benefit increases as his minimum acceptable payoff increases.In this case,the veto player has intention to obtain more resources by presenting a higher minimum acceptable payoff.
文摘由3篇论文组成的系列文章为电力市场和电力系统动态交互仿真研究设计了仿真平台DSPMPS(dynamic simulation platform for power market&power system)。作为首篇,梳理电力市场仿真的需求与仿真平台的技术支撑,提出设计目标。基于信息获取、知识提取、决策支持3个要素,介绍DSPMPS的功能设计,以作为支撑层和应用层的设计依据。着重解决的问题包括实验经济学方法,跨领域的动态交互仿真,风险的在线定量分析,多目标多控制手段的决策支持。已经开展的研究工作验证了其功能设计的有效性。
文摘3篇系列文章中的前2篇分别介绍了电力市场与电力系统动态交互仿真平台(dynamicsimulation platform for power market&power system,DSPMPS)的功能设计和支撑层。利用支撑层提供的仿真应用开发接口及工具,按功能设计要求,以软件积木方式搭建反映广义阻塞、支持图形化决策试探的各种可伸缩应用。从物理系统与经济系统动态交互的典型问题中提炼出高使用频率的模型,设计仿真组件库,以提高组件的可重用性。采用功能强大的电力系统稳定性量化分析软件包FASTEST作为物理仿真引擎。利用仿真标记语言编写实验脚本,配置多角色多参与者的各种仿真应用。通过典型的仿真实验过程说明了该平台的设计特点。
文摘由3篇论文组成的系列文章介绍了电力市场与电力系统动态交互仿真平台(dynamicsimulation platform for power market&power system,DSPMPS)。在第1篇关于功能设计的基础上,文中建立了仿真应用开发接口,以支持在服务器端及客户端扩展各种内核组件。利用IT技术,以可视化、网络化的实验形式来建立多角色博弈;以事件驱动和面向服务机制模拟多尺度时间、空间、目标的复杂动态交互;以标准化、积木化的形式实现高效灵活的用户定制及扩展开发;按国际标准创建开放且高效的多核运算架构。下一篇将介绍以软件积木方式,在支撑层上搭建反映广义阻塞、支持图形化决策试探的各种仿真应用。