In repeated zero-sum games,instead of constantly playing an equilibrium strategy of the stage game,learning to exploit the opponent given historical interactions could typically obtain a higher utility.However,when pl...In repeated zero-sum games,instead of constantly playing an equilibrium strategy of the stage game,learning to exploit the opponent given historical interactions could typically obtain a higher utility.However,when playing against a fully adaptive opponent,one would have dificulty identifying the opponent's adaptive dynamics and further exploiting its potential weakness.In this paper,we study the problem of optimizing against the adaptive opponent who uses no-regret learning.No-regret learning is a classic and widely-used branch of adaptive learning algorithms.We propose a general framework for online modeling no-regret opponents and exploiting their weakness.With this framework,one could approximate the opponent's no-regret learning dynamics and then develop a response plan to obtain a significant profit based on the inferences of the opponent's strategies.We employ two system identification architectures,including the recurrent neural network(RNN)and the nonlinear autoregressive exogenous model,and adopt an efficient greedy response plan within the framework.Theoretically,we prove the approximation capability of our RNN architecture at approximating specific no-regret dynamics.Empirically,we demonstrate that during interactions at a low level of non-stationarity,our architectures could approximate the dynamics with a low error,and the derived policies could exploit the no-regret opponent to obtain a decent utility.展开更多
Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decisi...Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning.展开更多
In this paper, we characterize the players’ behavior in the stock market by the repeated game model with asymmetric information. We show that the discount price process of stock is a martingale driven by Brownian mot...In this paper, we characterize the players’ behavior in the stock market by the repeated game model with asymmetric information. We show that the discount price process of stock is a martingale driven by Brownian motion, and give an endogenous explanation for the random fluctuation of stock price: the randomizations in the market is due to the randomizations in the strategy of the informed player which hopes to avoid revealing his private information. On this basis, through studying the corresponding option pricing problem furtherly, we can give the expression of function<em> φ</em>.展开更多
The aim of this paper is to reveal the mechanism of compromise and change in coordination where players agree in general but disagree on coordination methods. When players agree on the need to collaborate but are in c...The aim of this paper is to reveal the mechanism of compromise and change in coordination where players agree in general but disagree on coordination methods. When players agree on the need to collaborate but are in conflict regarding the specific method, one player must always compromise. This situation is known as the Battle of the Sexes in game theory. It has ever been believed that if an agreement is reached under such circumstances, the players do not have the incentive to withdraw from the agreement. However, this study shows that this belief is not always true if the players were able to revise the outcome of their negotiations later. The wide-ranging fields use game theories for their analysis frameworks to analyze the success or failure of coordination. However, comparing with the possibility of betrayal illustrated as a well-known Prisoner Dilemma, it has been rare to discuss conflict regarding the specific method of coordination, although such situations are often observed in today's interdependent real world. The repeated Battle of the Sexes games presented in this study would be a useful framework to analyze conflict regarding the specific method of coordination.展开更多
Game model of environmental protection at scenic spots is established in this paper in order to carry out analysis of the equilibrium between one-time game and infinitely repeated game, disclose the reasons why the en...Game model of environmental protection at scenic spots is established in this paper in order to carry out analysis of the equilibrium between one-time game and infinitely repeated game, disclose the reasons why the environment of scenic spots are destroyed, and propose the countermeasures to ensure the equilibrium of the game. The study also reveals that during the one-time game between tourists and tour operators, it is tour operators' dominant strategy not to control environmental pollution that leads to the destruction of environment at scenic spots. While, during the infinitely repeated game between tourists and tour operators, the realization of Pareto optimality equilibrium (The strategy of tourist is traveling, and the strategy of tour operators are controlling environmental pollution) is dependent upon the choice of players (tourists or operators) of triggering strategy (traveling or controlling environmental pollution). The supervision of the government upon operators can force them to control environmental pollution, which can consequently improve the efficiency of equilibrium in the game, and promote environmental protection at tourism scenic spots and sustainable development of tourism.展开更多
Repeated games describe situations where players interact with each other in a dynamic pattern and make decisions ac- cording to outcomes of previous stage games. Very recently, Press and Dyson have revealed a new cla...Repeated games describe situations where players interact with each other in a dynamic pattern and make decisions ac- cording to outcomes of previous stage games. Very recently, Press and Dyson have revealed a new class of zero-determinant (ZD) strategies for the repeated games, which can enforce a fixed linear relationship between expected payoffs of two play- ers, indicating that a smart player can control her unwitting co-player's payoff in a unilateral way [Proc. Acad. Natl. Sci. USA 109, 10409 (2012)]. The theory of ZD strategies provides a novel viewpoint to depict interactions among players, and fundamentally changes the research paradigm of game theory. In this brief survey, we first introduce the mathematical framework of ZD strategies, and review the properties and constrains of two specifications of ZD strategies, called pinning strategies and extortion strategies. Then we review some representative research progresses, including robustness analysis, cooperative ZD strategy analysis, and evolutionary stability analysis. Finally, we discuss some significant extensions to ZD strategies, including the multi-player ZD strategies, and ZD strategies under noise. Challenges in related research fields are also listed.展开更多
In this paper,we consider to learn the inherent probability distribution of types via knowledge transfer in a two-player repeated Bayesian game,which is a basic model in network security.In the Bayesian game,the attac...In this paper,we consider to learn the inherent probability distribution of types via knowledge transfer in a two-player repeated Bayesian game,which is a basic model in network security.In the Bayesian game,the attacker's distribution of types is unknown by the defender and the defender aims to reconstruct the distribution with historical actions.lt is dificult to calculate the distribution of types directly since the distribution is coupled with a prediction function of the attacker in the game model.Thus,we seek help from an interrelated complete-information game,based on the idea of transfer learning.We provide two different methods to estimate the prediction function in difftrent concrete conditions with knowledge transfer.After obtaining the estimated prediction function,the deiender can decouple the inherent distribution and the prediction function in the Bayesian game,and moreover,reconstruct the distribution of the attacker's types.Finally,we give numerical examples to illustrate the effectiveness of our methods.展开更多
This paper proposes a negotiation-based TDMA scheme for ad hoc networks, which was modeled as an asynchronous myopic repeated game. Compared to the traditional centralized TDMA schemes, our scheme operates in a decent...This paper proposes a negotiation-based TDMA scheme for ad hoc networks, which was modeled as an asynchronous myopic repeated game. Compared to the traditional centralized TDMA schemes, our scheme operates in a decentralized manner and is scalable to topology changes. Simulation results show that, with respect to the coloring quality, the performance of our scheme is close to that of the classical centralized algorithms with much lower complexity.展开更多
Focusing on dropping packets attacks in sensor networks, we propose a model of dropping packets attack-resistance as a repeated game based on such an assumption that sensor nodes are rational. The model prevents malic...Focusing on dropping packets attacks in sensor networks, we propose a model of dropping packets attack-resistance as a repeated game based on such an assumption that sensor nodes are rational. The model prevents malicious nodes from attacking by establishing punishment mechanism, and impels sensor networks to reach a collaborative Nash equilibrium. Simulation results show that the devised model can effectively resist the dropping packets attacks(DPA) by choosing reasonable configuration parameters.展开更多
Utilized fundamental theory and analysis method of Incomplete Information repeated games, introduced Incomplete Information into repeated games, and established two stages dynamic games model of the local authority an...Utilized fundamental theory and analysis method of Incomplete Information repeated games, introduced Incomplete Information into repeated games, and established two stages dynamic games model of the local authority and the coal mine owner. The analytic result indicates that: so long as the country established the corresponding rewards and punishments incentive mechanism to the local authority departments responsible for the work, it reports the safety accident in the coal mine on time. The conclusion that the local government displays right and wrong cooperation behavior will be changed with the introduction of the Incomplete Information. Only has the local authority fulfill their responsibility, can the unsafe accident be controlled effectively. Once this kind of cooperation of local government appears, the costs of the country on the safe supervise and the difficulty will be able to decrease greatly.展开更多
This paper focuses on the performance of equalizer zero-determinant(ZD)strategies in discounted repeated Stackelberg asymmetric games.In the leader-follower adversarial scenario,the strong Stackelberg equilibrium(SSE)...This paper focuses on the performance of equalizer zero-determinant(ZD)strategies in discounted repeated Stackelberg asymmetric games.In the leader-follower adversarial scenario,the strong Stackelberg equilibrium(SSE)deriving from the opponents’best response(BR),is technically the optimal strategy for the leader.However,computing an SSE strategy may be difficult since it needs to solve a mixed-integer program and has exponential complexity in the number of states.To this end,the authors propose an equalizer ZD strategy,which can unilaterally restrict the opponent’s expected utility.The authors first study the existence of an equalizer ZD strategy with one-to-one situations,and analyze an upper bound of its performance with the baseline SSE strategy.Then the authors turn to multi-player models,where there exists one player adopting an equalizer ZD strategy.The authors give bounds of the weighted sum of opponents’s utilities,and compare it with the SSE strategy.Finally,the authors give simulations on unmanned aerial vehicles(UAVs)and the moving target defense(MTD)to verify the effectiveness of the proposed approach.展开更多
基金the Science and Technology Innovation 2030-"New Generation Artificial Intelligence"Major Project(No.2018AAA0100901)。
文摘In repeated zero-sum games,instead of constantly playing an equilibrium strategy of the stage game,learning to exploit the opponent given historical interactions could typically obtain a higher utility.However,when playing against a fully adaptive opponent,one would have dificulty identifying the opponent's adaptive dynamics and further exploiting its potential weakness.In this paper,we study the problem of optimizing against the adaptive opponent who uses no-regret learning.No-regret learning is a classic and widely-used branch of adaptive learning algorithms.We propose a general framework for online modeling no-regret opponents and exploiting their weakness.With this framework,one could approximate the opponent's no-regret learning dynamics and then develop a response plan to obtain a significant profit based on the inferences of the opponent's strategies.We employ two system identification architectures,including the recurrent neural network(RNN)and the nonlinear autoregressive exogenous model,and adopt an efficient greedy response plan within the framework.Theoretically,we prove the approximation capability of our RNN architecture at approximating specific no-regret dynamics.Empirically,we demonstrate that during interactions at a low level of non-stationarity,our architectures could approximate the dynamics with a low error,and the derived policies could exploit the no-regret opponent to obtain a decent utility.
基金We also acknowledge the support by the National Natural Science Foundation of China (Grant No. 60574071).
文摘Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning.
文摘In this paper, we characterize the players’ behavior in the stock market by the repeated game model with asymmetric information. We show that the discount price process of stock is a martingale driven by Brownian motion, and give an endogenous explanation for the random fluctuation of stock price: the randomizations in the market is due to the randomizations in the strategy of the informed player which hopes to avoid revealing his private information. On this basis, through studying the corresponding option pricing problem furtherly, we can give the expression of function<em> φ</em>.
文摘The aim of this paper is to reveal the mechanism of compromise and change in coordination where players agree in general but disagree on coordination methods. When players agree on the need to collaborate but are in conflict regarding the specific method, one player must always compromise. This situation is known as the Battle of the Sexes in game theory. It has ever been believed that if an agreement is reached under such circumstances, the players do not have the incentive to withdraw from the agreement. However, this study shows that this belief is not always true if the players were able to revise the outcome of their negotiations later. The wide-ranging fields use game theories for their analysis frameworks to analyze the success or failure of coordination. However, comparing with the possibility of betrayal illustrated as a well-known Prisoner Dilemma, it has been rare to discuss conflict regarding the specific method of coordination, although such situations are often observed in today's interdependent real world. The repeated Battle of the Sexes games presented in this study would be a useful framework to analyze conflict regarding the specific method of coordination.
基金Supported by Scientific Research Initiation (Supporting) Funds of Northwest A&F University~~
文摘Game model of environmental protection at scenic spots is established in this paper in order to carry out analysis of the equilibrium between one-time game and infinitely repeated game, disclose the reasons why the environment of scenic spots are destroyed, and propose the countermeasures to ensure the equilibrium of the game. The study also reveals that during the one-time game between tourists and tour operators, it is tour operators' dominant strategy not to control environmental pollution that leads to the destruction of environment at scenic spots. While, during the infinitely repeated game between tourists and tour operators, the realization of Pareto optimality equilibrium (The strategy of tourist is traveling, and the strategy of tour operators are controlling environmental pollution) is dependent upon the choice of players (tourists or operators) of triggering strategy (traveling or controlling environmental pollution). The supervision of the government upon operators can force them to control environmental pollution, which can consequently improve the efficiency of equilibrium in the game, and promote environmental protection at tourism scenic spots and sustainable development of tourism.
基金supported by the National Natural Science Foundation of China(Grant Nos.61004098 and 11222543)the Program for New Century Excellent Talentsin Universities of China(Grant No.NCET-11-0070)+2 种基金the Special Project of Youth Science and Technology Innovation Research Team of Sichuan ProvinceChina(Grant No.2013TD0006)the Research Foundation of UESTC and Scholars Program of Hong Kong(Grant No.G-YZ4D)
文摘Repeated games describe situations where players interact with each other in a dynamic pattern and make decisions ac- cording to outcomes of previous stage games. Very recently, Press and Dyson have revealed a new class of zero-determinant (ZD) strategies for the repeated games, which can enforce a fixed linear relationship between expected payoffs of two play- ers, indicating that a smart player can control her unwitting co-player's payoff in a unilateral way [Proc. Acad. Natl. Sci. USA 109, 10409 (2012)]. The theory of ZD strategies provides a novel viewpoint to depict interactions among players, and fundamentally changes the research paradigm of game theory. In this brief survey, we first introduce the mathematical framework of ZD strategies, and review the properties and constrains of two specifications of ZD strategies, called pinning strategies and extortion strategies. Then we review some representative research progresses, including robustness analysis, cooperative ZD strategy analysis, and evolutionary stability analysis. Finally, we discuss some significant extensions to ZD strategies, including the multi-player ZD strategies, and ZD strategies under noise. Challenges in related research fields are also listed.
基金This work was supported by the National Key Research and Development Program(No.2016YFB0901900)the National Natural Science Foundation of China(No.61733018)The authors would like to thank Prof.Peng Yi for his helpful suggestions.
文摘In this paper,we consider to learn the inherent probability distribution of types via knowledge transfer in a two-player repeated Bayesian game,which is a basic model in network security.In the Bayesian game,the attacker's distribution of types is unknown by the defender and the defender aims to reconstruct the distribution with historical actions.lt is dificult to calculate the distribution of types directly since the distribution is coupled with a prediction function of the attacker in the game model.Thus,we seek help from an interrelated complete-information game,based on the idea of transfer learning.We provide two different methods to estimate the prediction function in difftrent concrete conditions with knowledge transfer.After obtaining the estimated prediction function,the deiender can decouple the inherent distribution and the prediction function in the Bayesian game,and moreover,reconstruct the distribution of the attacker's types.Finally,we give numerical examples to illustrate the effectiveness of our methods.
基金supported in part by National Science Fund for Distinguished Young Scholars under Grant No.60725105National Key Basic Research Program of China ( 973 Program ) under Grant No.2009CB320404+2 种基金Program for Changjiang Scholars and Innovative Research Team in University under Grant No.IRT0852National Natural Science Foundation of China under Grants No.60972047, 61072068111 Project under Grant No.B08038
文摘This paper proposes a negotiation-based TDMA scheme for ad hoc networks, which was modeled as an asynchronous myopic repeated game. Compared to the traditional centralized TDMA schemes, our scheme operates in a decentralized manner and is scalable to topology changes. Simulation results show that, with respect to the coloring quality, the performance of our scheme is close to that of the classical centralized algorithms with much lower complexity.
基金the National Defense Basic Research Foun-dation of China (C2720061361)
文摘Focusing on dropping packets attacks in sensor networks, we propose a model of dropping packets attack-resistance as a repeated game based on such an assumption that sensor nodes are rational. The model prevents malicious nodes from attacking by establishing punishment mechanism, and impels sensor networks to reach a collaborative Nash equilibrium. Simulation results show that the devised model can effectively resist the dropping packets attacks(DPA) by choosing reasonable configuration parameters.
文摘Utilized fundamental theory and analysis method of Incomplete Information repeated games, introduced Incomplete Information into repeated games, and established two stages dynamic games model of the local authority and the coal mine owner. The analytic result indicates that: so long as the country established the corresponding rewards and punishments incentive mechanism to the local authority departments responsible for the work, it reports the safety accident in the coal mine on time. The conclusion that the local government displays right and wrong cooperation behavior will be changed with the introduction of the Incomplete Information. Only has the local authority fulfill their responsibility, can the unsafe accident be controlled effectively. Once this kind of cooperation of local government appears, the costs of the country on the safe supervise and the difficulty will be able to decrease greatly.
基金supported by the National Key Research and Development Program of China under Grant No.2022YFA1004700the National Natural Science Foundation of China under Grant No.62173250Shanghai Municipal Science and Technology Major Project under Grant No.2021SHZDZX0100.
文摘This paper focuses on the performance of equalizer zero-determinant(ZD)strategies in discounted repeated Stackelberg asymmetric games.In the leader-follower adversarial scenario,the strong Stackelberg equilibrium(SSE)deriving from the opponents’best response(BR),is technically the optimal strategy for the leader.However,computing an SSE strategy may be difficult since it needs to solve a mixed-integer program and has exponential complexity in the number of states.To this end,the authors propose an equalizer ZD strategy,which can unilaterally restrict the opponent’s expected utility.The authors first study the existence of an equalizer ZD strategy with one-to-one situations,and analyze an upper bound of its performance with the baseline SSE strategy.Then the authors turn to multi-player models,where there exists one player adopting an equalizer ZD strategy.The authors give bounds of the weighted sum of opponents’s utilities,and compare it with the SSE strategy.Finally,the authors give simulations on unmanned aerial vehicles(UAVs)and the moving target defense(MTD)to verify the effectiveness of the proposed approach.