In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others...In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.展开更多
The real-time path optimization for heterogeneous vehicle fleets in large-scale road networks presents significant challenges due to conflicting traffic demands and imbalanced resource allocation.While existing vehicl...The real-time path optimization for heterogeneous vehicle fleets in large-scale road networks presents significant challenges due to conflicting traffic demands and imbalanced resource allocation.While existing vehicleto-infrastructure coordination frameworks partially address congestion mitigation,they often neglect priority-aware optimization and exhibit algorithmic bias toward dominant vehicle classes—critical limitations in mixed-priority scenarios involving emergency vehicles.To bridge this gap,this study proposes a preference game-theoretic coordination framework with adaptive strategy transfer protocol,explicitly balancing system-wide efficiency(measured by network throughput)with priority vehicle rights protection(quantified via time-sensitive utility functions).The approach innovatively combines(1)a multi-vehicle dynamic routing model with quantifiable preference weights,and(2)a distributed Nash equilibrium solver updated using replicator sub-dynamic models.The framework was evaluated on an urban road network containing 25 intersections with mixed priority ratios(10%–30%of vehicles with priority access demand),and the framework showed consistent benefits on four benchmarks(Social routing algorithm,Shortest path algorithm,The comprehensive path optimisation model,The emergency vehicle timing collaborative evolution path optimization method)showed consistent benefits.Results showthat across different traffic demand configurations,the proposed method reduces the average vehicle traveling time by at least 365 s,increases the road network throughput by 48.61%,and effectively balances the road loads.This approach successfully meets the diverse traffic demands of various vehicle types while optimizing road resource allocations.The proposed coordination paradigm advances theoretical foundations for fairness-aware traffic optimization while offering implementable strategies for next-generation cooperative vehicle-road systems,particularly in smart city deployments requiring mixed-priority mobility guarantees.展开更多
To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on ...To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.展开更多
The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address th...The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios.展开更多
As an efficient method of solving subgame-perfect Nash equilibrium,the backward induction is analyzed from an evolutionary point of view in this paper,replacing a player with a population and turning a game into a pop...As an efficient method of solving subgame-perfect Nash equilibrium,the backward induction is analyzed from an evolutionary point of view in this paper,replacing a player with a population and turning a game into a population game,which shows that equilibrium of a perfect information game is the unique evolutionarily stable outcome for dynamic models in the limit.展开更多
One of the assumptions of previous research in evolutionary game dynamics is that individuals use only one rule to update their strategy. In reality, an individual's strategy update rules may change with the envir...One of the assumptions of previous research in evolutionary game dynamics is that individuals use only one rule to update their strategy. In reality, an individual's strategy update rules may change with the environment, and it is possible for an individual to use two or more rules to update their strategy. We consider the case where an individual updates strategies based on the Moran and imitation processes, and establish mixed stochastic evolutionary game dynamics by combining both processes. Our aim is to study how individuals change strategies based on two update rules and how this affects evolutionary game dynamics. We obtain an analytic expression and properties of the fixation probability and fixation times(the unconditional fixation time or conditional average fixation time) associated with our proposed process. We find unexpected results. The fixation probability within the proposed model is independent of the probabilities that the individual adopts the imitation rule update strategy. This implies that the fixation probability within the proposed model is equal to that from the Moran and imitation processes. The one-third rule holds in the proposed mixed model. However, under weak selection, the fixation times are different from those of the Moran and imitation processes because it is connected with the probability that individuals adopt an imitation update rule. Numerical examples are presented to illustrate the relationships between fixation times and the probability that an individual adopts the imitation update rule, as well as between fixation times and selection intensity. From the simulated analysis, we find that the fixation time for a mixed process is greater than that of the Moran process, but is less than that of the imitation process. Moreover, the fixation times for a cooperator in the proposed process increase as the probability of adopting an imitation update increases; however, the relationship becomes more complex than a linear relationship.展开更多
In this paper, we conduct research on the dynamic demand response problem in smart grid to control the energy consumption. The objective of the energy consumption control is constructed based on differential game, as ...In this paper, we conduct research on the dynamic demand response problem in smart grid to control the energy consumption. The objective of the energy consumption control is constructed based on differential game, as the dynamic of each users’ energy state in smart gird can be described based on a differential equation. Concept of electricity sharing is introduced to achieve load shift of main users from the high price hours to the low price hours. Nash equilibrium is given based on the Hamilton equation and the effectiveness of the proposed model is verified based on the numerical simulation results.展开更多
This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize t...This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize to the state of a command multi-agent dynamical systems, where pinning control is used generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents' dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.展开更多
A power source–power grid coordinated typhoon defense strategy is proposed in this study to minimize the cost of power grid anti-typhoon reinforcement measures and improve defense efficiency.It is based on multiagent...A power source–power grid coordinated typhoon defense strategy is proposed in this study to minimize the cost of power grid anti-typhoon reinforcement measures and improve defense efficiency.It is based on multiagent dynamic game theory.This strategy regards a typhoon as a rational gamer that always causes the greatest damage.Together with the grid planner and black start unit(BSU)planner,it forms a multiagent defense–attack–defense dynamic game model naturally.The model is adopted to determine the optimal reinforcements for the transmission lines,black start power capacity,and location.Typhoon Hato,which struck a partial coastal area in Guangdong province in China in 2017,was adopted to formulate a step-by-step model of a typhoon attacking coastal area power systems.The results were substituted into the multiagent defense–attack–defense dynamic game model to obtain the optimal transmission line reinforcement positions,as well as optimal BSU capacity and geographic positions.An effective typhoon defense strategy and minimum load shedding were achieved,demonstrating the feasibility and correctness of the proposed strategy.The related theories and methods of this study have positive significance for the prevention of uncertain large-scale natural disasters.展开更多
Nowadays, security defence of network uses the game theory, which mostly applies complete information game model or even the static game model. To get closer to the actual network and defend actively, we propose a net...Nowadays, security defence of network uses the game theory, which mostly applies complete information game model or even the static game model. To get closer to the actual network and defend actively, we propose a network attack-defence game model by using signalling game, which is modelled in the way of dynamic and incomplete information. We improve the traditional attack-defence strategies quantization method to meet the needs of the network signalling game model. Moreover, we give the calculation of the game equilibrium and analyse the optimal defence scheme. Finally, we analyse and verify effectiveness of the model and method through a simulation experiment.展开更多
Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain info...Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain information of the opponents.As such,this paper presents a cooperative decision-making method based on incomplete information dynamic game to generate maneuver strategies for multiple UAVs in air combat.Firstly,a cooperative situation assessment model is presented to measure the overall combat situation.Secondly,an incomplete information dynamic game model is proposed to model the dynamic process of air combat,and a dynamic Bayesian network is designed to infer the tactical intention of the opponent.Then a reinforcement learning framework based on multiagent deep deterministic policy gradient is established to obtain the perfect Bayes-Nash equilibrium solution of the air combat game model.Finally,a series of simulations are conducted to verify the effectiveness of the proposed method,and the simulation results show effective synergies and cooperative tactics.展开更多
With the explosive growth of highspeed wireless data demand and the number of mobile devices, fog radio access networks(F-RAN) with multi-layer network structure becomes a hot topic in recent research. Meanwhile, due ...With the explosive growth of highspeed wireless data demand and the number of mobile devices, fog radio access networks(F-RAN) with multi-layer network structure becomes a hot topic in recent research. Meanwhile, due to the rapid growth of mobile communication traffic, high cost and the scarcity of wireless resources, it is especially important to develop an efficient radio resource management mechanism. In this paper, we focus on the shortcomings of resource waste, and we consider the actual situation of base station dynamic coverage and user requirements. We propose a spectrum pricing and allocation scheme based on Stackelberg game model under F-RAN framework, realizing the allocation of resource on demand. This scheme studies the double game between the users and the operators, as well as between the traditional operators and the virtual operators, maximizing the profits of the operators. At the same time, spectrum reuse technology is adopted to improve the utilization of network resource. By analyzing the simulation results, it is verified that our proposed scheme can not only avoid resource waste, but also effectively improve the operator's revenue efficiency and overall network resource utilization.展开更多
This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the l...This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.展开更多
In this paper,we consider distributed Nash equilibrium(NE)seeking in potential games over a multi-agent network,where each agent can not observe the actions of all its rivals.Based on the best response dynamics,we des...In this paper,we consider distributed Nash equilibrium(NE)seeking in potential games over a multi-agent network,where each agent can not observe the actions of all its rivals.Based on the best response dynamics,we design a distributed NE seeking algorithm by incorporating the non-smooth finite-time average tracking dynamics,where each agent only needs to know its own action and exchange information with its neighbours through a communication graph.We give a sufficient condition for the Lipschitz continuity of the best response mapping for potential games,and then prove the convergence of the proposed algorithm based on the Lyapunov theory.Numerical simulations are given to verify the resultandillustrate the effectiveness of the algorithm.展开更多
Social interaction with peer pressure is widely studied in social network analysis.Game theory can be utilized to model dynamic social interaction,and one class of game network models assumes that people’s decision p...Social interaction with peer pressure is widely studied in social network analysis.Game theory can be utilized to model dynamic social interaction,and one class of game network models assumes that people’s decision payoff functions hinge on individual covariates and the choices of their friends.However,peer pressure would be misidentified and induce a non-negligible bias when incomplete covariates are involved in the game model.For this reason,we develop a generalized constant peer effects model based on homogeneity structure in dynamic social networks.The new model can effectively avoid bias through homogeneity pursuit and can be applied to a wider range of scenarios.To estimate peer pressure in the model,we first present two algorithms based on the initialize expand merge method and the polynomial-time twostage method to estimate homogeneity parameters.Then we apply the nested pseudo-likelihood method and obtain consistent estimators of peer pressure.Simulation evaluations show that our proposed methodology can achieve desirable and effective results in terms of the community misclassification rate and parameter estimation error.We also illustrate the advantages of our model in the empirical analysis when compared with a benchmark model.展开更多
After building a dynamic evolutionary game model, the essay studies the stability of the equilibrium in the game between the commercial banks and the closed-loop supply chain(CLSC) enterprises. By design of systematic...After building a dynamic evolutionary game model, the essay studies the stability of the equilibrium in the game between the commercial banks and the closed-loop supply chain(CLSC) enterprises. By design of systematic mechanism based on system dynamics theory, capital chains of independent small and medium-sized enterprises(SMEs) on CLSC are organically linked together. Moreover, a comparative simulation is studied for the previous independent and post-design dependent systems. The study shows that with business expanding and market risk growing, the independent finance chains of SMEs on CLSC often take on a certain vulnerability, while the SMEs closed-loop supply chain finance system itself is with a strong rigidity and concerto.展开更多
With the rapid improvement of urbanization and industrialization in countries around the world,how to effectively solve the rapid demise of traditional villages is a social dilemma faced by all countries,which is why ...With the rapid improvement of urbanization and industrialization in countries around the world,how to effectively solve the rapid demise of traditional villages is a social dilemma faced by all countries,which is why a series of relevant protection regulations have been promulgated in different historical periods.However,the formulation of relevant policies is still not scientific,universal,and long-term.In this study,we constructed an evolutionary game model of local governments and residents based on the evolutionary game theory(EGT),which is used to explore the evolutionary stability strategy(ESS)and stability conditions of stakeholders under the premise of mutual influence and restriction.Besides,the study also included the analysis about the impacts of different influence factors on the evolution tendency of the game model.At the same time,numerical simulation examples were used to verify the theoretical results and three crucial conclusions have been drawn.Firstly,the strategic evolution of stakeholders is a dynamic process of continuous adjustment and optimization,and its results and speed show consistent interdependence.Secondly,the decision-making of stakeholders mainly depends on the basic cost,and the high cost of investment is not conducive to the protection of traditional villages.Thirdly,the dynamic evolutionary mechanism composed of different influence factors will have an impact on the direction and speed of decision-making of stakeholders,which provides the basis for them to effectively restrict the decision-making of each other.This study eliminates the weaknesses of existing research approaches and provides scientific and novel ideas for the protection of traditional villages,which can contribute to the formulation and improvement of the relevant laws and regulations.展开更多
In this study,we construct a bi-level optimization model based on the Stackelberg game and propose a robust optimization algorithm for solving the bi-level model,assuming an actual situation with several participants ...In this study,we construct a bi-level optimization model based on the Stackelberg game and propose a robust optimization algorithm for solving the bi-level model,assuming an actual situation with several participants in energy trading.Firstly,the energy trading process is analyzed between each subject based on the establishment of the operation framework of multi-agent participation in energy trading.Secondly,the optimal operation model of each energy trading agent is established to develop a bi-level game model including each energy participant.Finally,a combination algorithm of improved robust optimization over time(ROOT)and CPLEX is proposed to solve the established game model.The experimental results indicate that under different fitness thresholds,the robust optimization results of the proposed algorithm are increased by 56.91%and 68.54%,respectively.The established bi-level game model effectively balances the benefits of different energy trading entities.The proposed algorithm proposed can increase the income of each participant in the game by an average of 8.59%.展开更多
Path planning is a fundamental component in robotics and game artificial intelligence that considerably influences the motion efficiency of robots and unmanned aerial vehicles,as well as the realism and immersion of v...Path planning is a fundamental component in robotics and game artificial intelligence that considerably influences the motion efficiency of robots and unmanned aerial vehicles,as well as the realism and immersion of virtual environments.However,traditional algorithms are often limited to single-objective optimization and lack real-time adaptability to dynamic environments.This study addresses these limitations through a proposed realtime dynamic multiobjective(RDMO)path-planning algorithm based on an enhanced A^(*) framework.The proposed algorithm employs a queue-based structure and composite multiheuristic functions to dynamically manage game tasks and compute optimal paths under changing-map-connectivity conditions in real time.Simulation experiments are conducted using real-world road network data and benchmarked against mainstream hybrid approaches based on genetic algorithms(GAs)and simulated annealing(SA).The results show that the computational speed of the RDMO algorithm is 88 and 73 times faster than that of the GA-and SA-based solutions,respectively,while the total planned path length is reduced by 58%and 33%,respectively.In addition,the RDMO algorithm also shows excellent responsiveness to dynamic changes in map connectivity and can achieve real-time replanning with a minimal computational overhead.The research results prove that the RDMO algorithm provides a robust and efficient solution for multiobjective path planning in games and robotics applications and has a great application potential in improving system performance and user experience in related fields in the future.展开更多
A method of the parallel computation of the linear quadratic non cooperative dynamic games problem is proposed. The Lyapunov function is introduced, through which the form adapted to parallel computation of the open ...A method of the parallel computation of the linear quadratic non cooperative dynamic games problem is proposed. The Lyapunov function is introduced, through which the form adapted to parallel computation of the open loop Nash equilibrium strategies is gi展开更多
基金supported by the Aeronautical Science Foundation of China(20220001057001)an Open Project of the National Key Laboratory of Air-based Information Perception and Fusion(202437)
文摘In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.
基金funded by the National Key Research and Development Program Project 2022YFB4300404.
文摘The real-time path optimization for heterogeneous vehicle fleets in large-scale road networks presents significant challenges due to conflicting traffic demands and imbalanced resource allocation.While existing vehicleto-infrastructure coordination frameworks partially address congestion mitigation,they often neglect priority-aware optimization and exhibit algorithmic bias toward dominant vehicle classes—critical limitations in mixed-priority scenarios involving emergency vehicles.To bridge this gap,this study proposes a preference game-theoretic coordination framework with adaptive strategy transfer protocol,explicitly balancing system-wide efficiency(measured by network throughput)with priority vehicle rights protection(quantified via time-sensitive utility functions).The approach innovatively combines(1)a multi-vehicle dynamic routing model with quantifiable preference weights,and(2)a distributed Nash equilibrium solver updated using replicator sub-dynamic models.The framework was evaluated on an urban road network containing 25 intersections with mixed priority ratios(10%–30%of vehicles with priority access demand),and the framework showed consistent benefits on four benchmarks(Social routing algorithm,Shortest path algorithm,The comprehensive path optimisation model,The emergency vehicle timing collaborative evolution path optimization method)showed consistent benefits.Results showthat across different traffic demand configurations,the proposed method reduces the average vehicle traveling time by at least 365 s,increases the road network throughput by 48.61%,and effectively balances the road loads.This approach successfully meets the diverse traffic demands of various vehicle types while optimizing road resource allocations.The proposed coordination paradigm advances theoretical foundations for fairness-aware traffic optimization while offering implementable strategies for next-generation cooperative vehicle-road systems,particularly in smart city deployments requiring mixed-priority mobility guarantees.
基金supported by the Major Projects for Science and Technology Innovation 2030(2018AAA0100805).
文摘To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.
基金funded in part by the Humanities and Social Sciences Planning Foundation of Ministry of Education of China under Grant No.24YJAZH123National Undergraduate Innovation and Entrepreneurship Training Program of China under Grant No.202510347069the Huzhou Science and Technology Planning Foundation under Grant No.2023GZ04.
文摘The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios.
文摘As an efficient method of solving subgame-perfect Nash equilibrium,the backward induction is analyzed from an evolutionary point of view in this paper,replacing a player with a population and turning a game into a population game,which shows that equilibrium of a perfect information game is the unique evolutionarily stable outcome for dynamic models in the limit.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.71871171,71871173,and 71832010)
文摘One of the assumptions of previous research in evolutionary game dynamics is that individuals use only one rule to update their strategy. In reality, an individual's strategy update rules may change with the environment, and it is possible for an individual to use two or more rules to update their strategy. We consider the case where an individual updates strategies based on the Moran and imitation processes, and establish mixed stochastic evolutionary game dynamics by combining both processes. Our aim is to study how individuals change strategies based on two update rules and how this affects evolutionary game dynamics. We obtain an analytic expression and properties of the fixation probability and fixation times(the unconditional fixation time or conditional average fixation time) associated with our proposed process. We find unexpected results. The fixation probability within the proposed model is independent of the probabilities that the individual adopts the imitation rule update strategy. This implies that the fixation probability within the proposed model is equal to that from the Moran and imitation processes. The one-third rule holds in the proposed mixed model. However, under weak selection, the fixation times are different from those of the Moran and imitation processes because it is connected with the probability that individuals adopt an imitation update rule. Numerical examples are presented to illustrate the relationships between fixation times and the probability that an individual adopts the imitation update rule, as well as between fixation times and selection intensity. From the simulated analysis, we find that the fixation time for a mixed process is greater than that of the Moran process, but is less than that of the imitation process. Moreover, the fixation times for a cooperator in the proposed process increase as the probability of adopting an imitation update increases; however, the relationship becomes more complex than a linear relationship.
基金supported by National Key R&D Program of China, No.2018YFB1003905the Fundamental Research Funds for the Central Universities, No.FRF-TP-18-008A3
文摘In this paper, we conduct research on the dynamic demand response problem in smart grid to control the energy consumption. The objective of the energy consumption control is constructed based on differential game, as the dynamic of each users’ energy state in smart gird can be described based on a differential equation. Concept of electricity sharing is introduced to achieve load shift of main users from the high price hours to the low price hours. Nash equilibrium is given based on the Hamilton equation and the effectiveness of the proposed model is verified based on the numerical simulation results.
基金supported by the Deanship of Scientific Research at King Fahd University of Petroleum & Minerals Project(No.JF141002)the National Science Foundation(No.ECCS-1405173)+3 种基金the Office of Naval Research(Nos.N000141310562,N000141410718)the U.S. Army Research Office(No.W911NF-11-D-0001)the National Natural Science Foundation of China(No.61120106011)the Project 111 from the Ministry of Education of China(No.B08015)
文摘This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize to the state of a command multi-agent dynamical systems, where pinning control is used generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents' dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.
基金supported by the National Natural Science Foundation of China(No.U1766204)。
文摘A power source–power grid coordinated typhoon defense strategy is proposed in this study to minimize the cost of power grid anti-typhoon reinforcement measures and improve defense efficiency.It is based on multiagent dynamic game theory.This strategy regards a typhoon as a rational gamer that always causes the greatest damage.Together with the grid planner and black start unit(BSU)planner,it forms a multiagent defense–attack–defense dynamic game model naturally.The model is adopted to determine the optimal reinforcements for the transmission lines,black start power capacity,and location.Typhoon Hato,which struck a partial coastal area in Guangdong province in China in 2017,was adopted to formulate a step-by-step model of a typhoon attacking coastal area power systems.The results were substituted into the multiagent defense–attack–defense dynamic game model to obtain the optimal transmission line reinforcement positions,as well as optimal BSU capacity and geographic positions.An effective typhoon defense strategy and minimum load shedding were achieved,demonstrating the feasibility and correctness of the proposed strategy.The related theories and methods of this study have positive significance for the prevention of uncertain large-scale natural disasters.
基金supported by the National Natural Science Foundation of China under Grant No. 61303074 and No. 61309013the Henan Province Science and Technology Project Funds under Grant No. 12210231002
文摘Nowadays, security defence of network uses the game theory, which mostly applies complete information game model or even the static game model. To get closer to the actual network and defend actively, we propose a network attack-defence game model by using signalling game, which is modelled in the way of dynamic and incomplete information. We improve the traditional attack-defence strategies quantization method to meet the needs of the network signalling game model. Moreover, we give the calculation of the game equilibrium and analyse the optimal defence scheme. Finally, we analyse and verify effectiveness of the model and method through a simulation experiment.
基金supported by the National Natural Science Foundation of China(Grant No.61933010 and 61903301)Shaanxi Aerospace Flight Vehicle Design Key Laboratory。
文摘Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain information of the opponents.As such,this paper presents a cooperative decision-making method based on incomplete information dynamic game to generate maneuver strategies for multiple UAVs in air combat.Firstly,a cooperative situation assessment model is presented to measure the overall combat situation.Secondly,an incomplete information dynamic game model is proposed to model the dynamic process of air combat,and a dynamic Bayesian network is designed to infer the tactical intention of the opponent.Then a reinforcement learning framework based on multiagent deep deterministic policy gradient is established to obtain the perfect Bayes-Nash equilibrium solution of the air combat game model.Finally,a series of simulations are conducted to verify the effectiveness of the proposed method,and the simulation results show effective synergies and cooperative tactics.
基金supported in part by the National Natural Science Foundation of China (61771120)the Fundamental Research Funds for the Central Universities (N171602002)
文摘With the explosive growth of highspeed wireless data demand and the number of mobile devices, fog radio access networks(F-RAN) with multi-layer network structure becomes a hot topic in recent research. Meanwhile, due to the rapid growth of mobile communication traffic, high cost and the scarcity of wireless resources, it is especially important to develop an efficient radio resource management mechanism. In this paper, we focus on the shortcomings of resource waste, and we consider the actual situation of base station dynamic coverage and user requirements. We propose a spectrum pricing and allocation scheme based on Stackelberg game model under F-RAN framework, realizing the allocation of resource on demand. This scheme studies the double game between the users and the operators, as well as between the traditional operators and the virtual operators, maximizing the profits of the operators. At the same time, spectrum reuse technology is adopted to improve the utilization of network resource. By analyzing the simulation results, it is verified that our proposed scheme can not only avoid resource waste, but also effectively improve the operator's revenue efficiency and overall network resource utilization.
基金supported by the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation (USCAST2022-11)Aeronautical Science Foundation of China (20220001057001)。
文摘This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.
基金This work was supported by the Shanghai Sailing Program(No.20YF1453000)the Fundamental Research Funds for the Central Universities(No.22120200048).
文摘In this paper,we consider distributed Nash equilibrium(NE)seeking in potential games over a multi-agent network,where each agent can not observe the actions of all its rivals.Based on the best response dynamics,we design a distributed NE seeking algorithm by incorporating the non-smooth finite-time average tracking dynamics,where each agent only needs to know its own action and exchange information with its neighbours through a communication graph.We give a sufficient condition for the Lipschitz continuity of the best response mapping for potential games,and then prove the convergence of the proposed algorithm based on the Lyapunov theory.Numerical simulations are given to verify the resultandillustrate the effectiveness of the algorithm.
基金supported by the National Nature Science Foundation of China(71771201,72531009,71973001)the USTC Research Funds of the Double First-Class Initiative(FSSF-A-240202).
文摘Social interaction with peer pressure is widely studied in social network analysis.Game theory can be utilized to model dynamic social interaction,and one class of game network models assumes that people’s decision payoff functions hinge on individual covariates and the choices of their friends.However,peer pressure would be misidentified and induce a non-negligible bias when incomplete covariates are involved in the game model.For this reason,we develop a generalized constant peer effects model based on homogeneity structure in dynamic social networks.The new model can effectively avoid bias through homogeneity pursuit and can be applied to a wider range of scenarios.To estimate peer pressure in the model,we first present two algorithms based on the initialize expand merge method and the polynomial-time twostage method to estimate homogeneity parameters.Then we apply the nested pseudo-likelihood method and obtain consistent estimators of peer pressure.Simulation evaluations show that our proposed methodology can achieve desirable and effective results in terms of the community misclassification rate and parameter estimation error.We also illustrate the advantages of our model in the empirical analysis when compared with a benchmark model.
基金the Natural Science Research Fund of Hubei Province(No.2014BDH121)
文摘After building a dynamic evolutionary game model, the essay studies the stability of the equilibrium in the game between the commercial banks and the closed-loop supply chain(CLSC) enterprises. By design of systematic mechanism based on system dynamics theory, capital chains of independent small and medium-sized enterprises(SMEs) on CLSC are organically linked together. Moreover, a comparative simulation is studied for the previous independent and post-design dependent systems. The study shows that with business expanding and market risk growing, the independent finance chains of SMEs on CLSC often take on a certain vulnerability, while the SMEs closed-loop supply chain finance system itself is with a strong rigidity and concerto.
基金funded by the Southwest Minzu University 2021 Graduate Innovative Research Master Key Project(320-022142043).
文摘With the rapid improvement of urbanization and industrialization in countries around the world,how to effectively solve the rapid demise of traditional villages is a social dilemma faced by all countries,which is why a series of relevant protection regulations have been promulgated in different historical periods.However,the formulation of relevant policies is still not scientific,universal,and long-term.In this study,we constructed an evolutionary game model of local governments and residents based on the evolutionary game theory(EGT),which is used to explore the evolutionary stability strategy(ESS)and stability conditions of stakeholders under the premise of mutual influence and restriction.Besides,the study also included the analysis about the impacts of different influence factors on the evolution tendency of the game model.At the same time,numerical simulation examples were used to verify the theoretical results and three crucial conclusions have been drawn.Firstly,the strategic evolution of stakeholders is a dynamic process of continuous adjustment and optimization,and its results and speed show consistent interdependence.Secondly,the decision-making of stakeholders mainly depends on the basic cost,and the high cost of investment is not conducive to the protection of traditional villages.Thirdly,the dynamic evolutionary mechanism composed of different influence factors will have an impact on the direction and speed of decision-making of stakeholders,which provides the basis for them to effectively restrict the decision-making of each other.This study eliminates the weaknesses of existing research approaches and provides scientific and novel ideas for the protection of traditional villages,which can contribute to the formulation and improvement of the relevant laws and regulations.
基金supported by the National Nature Science Foundation of China(Nos.62063019)Natural Science Foundation of Gansu Province(22JR5RA241,2023CXZX-465).
文摘In this study,we construct a bi-level optimization model based on the Stackelberg game and propose a robust optimization algorithm for solving the bi-level model,assuming an actual situation with several participants in energy trading.Firstly,the energy trading process is analyzed between each subject based on the establishment of the operation framework of multi-agent participation in energy trading.Secondly,the optimal operation model of each energy trading agent is established to develop a bi-level game model including each energy participant.Finally,a combination algorithm of improved robust optimization over time(ROOT)and CPLEX is proposed to solve the established game model.The experimental results indicate that under different fitness thresholds,the robust optimization results of the proposed algorithm are increased by 56.91%and 68.54%,respectively.The established bi-level game model effectively balances the benefits of different energy trading entities.The proposed algorithm proposed can increase the income of each participant in the game by an average of 8.59%.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(NRF-2023R1A2C1005950).
文摘Path planning is a fundamental component in robotics and game artificial intelligence that considerably influences the motion efficiency of robots and unmanned aerial vehicles,as well as the realism and immersion of virtual environments.However,traditional algorithms are often limited to single-objective optimization and lack real-time adaptability to dynamic environments.This study addresses these limitations through a proposed realtime dynamic multiobjective(RDMO)path-planning algorithm based on an enhanced A^(*) framework.The proposed algorithm employs a queue-based structure and composite multiheuristic functions to dynamically manage game tasks and compute optimal paths under changing-map-connectivity conditions in real time.Simulation experiments are conducted using real-world road network data and benchmarked against mainstream hybrid approaches based on genetic algorithms(GAs)and simulated annealing(SA).The results show that the computational speed of the RDMO algorithm is 88 and 73 times faster than that of the GA-and SA-based solutions,respectively,while the total planned path length is reduced by 58%and 33%,respectively.In addition,the RDMO algorithm also shows excellent responsiveness to dynamic changes in map connectivity and can achieve real-time replanning with a minimal computational overhead.The research results prove that the RDMO algorithm provides a robust and efficient solution for multiobjective path planning in games and robotics applications and has a great application potential in improving system performance and user experience in related fields in the future.
文摘A method of the parallel computation of the linear quadratic non cooperative dynamic games problem is proposed. The Lyapunov function is introduced, through which the form adapted to parallel computation of the open loop Nash equilibrium strategies is gi