The communication in the Millimeter-wave(mmWave)band,i.e.,30~300 GHz,is characterized by short-range transmissions and the use of antenna beamforming(BF).Thus,multiple mmWave access points(APs)should be installed to f...The communication in the Millimeter-wave(mmWave)band,i.e.,30~300 GHz,is characterized by short-range transmissions and the use of antenna beamforming(BF).Thus,multiple mmWave access points(APs)should be installed to fully cover a target environment with gigabits per second(Gbps)connectivity.However,inter-beam interference prevents maximizing the sum rates of the established concurrent links.In this paper,a reinforcement learning(RL)approach is proposed for enabling mmWave concurrent transmissions by finding out beam directions that maximize the long-term average sum rates of the concurrent links.Specifically,the problem is formulated as a multiplayer multiarmed bandit(MAB),where mmWave APs act as the players aiming to maximize their achievable rewards,i.e.,data rates,and the arms to play are the available beam directions.In this setup,a selfish concurrent multiplayer MAB strategy is advocated.Four different MAB algorithms,namely,ϵ-greedy,upper confidence bound(UCB),Thompson sampling(TS),and exponential weight algorithm for exploration and exploitation(EXP3)are examined by employing them in each AP to selfishly enhance its beam selection based only on its previous observations.After a few rounds of interactions,mmWave APs learn how to select concurrent beams that enhance the overall system performance.The proposed MAB based mmWave concurrent BF shows comparable performance to the optimal solution.展开更多
The static nature of cyber defense systems gives attackers a sufficient amount of time to explore and further exploit the vulnerabilities of information technology systems.In this paper,we investigate a problem where ...The static nature of cyber defense systems gives attackers a sufficient amount of time to explore and further exploit the vulnerabilities of information technology systems.In this paper,we investigate a problem where multiagent sys-tems sensing and acting in an environment contribute to adaptive cyber defense.We present a learning strategy that enables multiple agents to learn optimal poli-cies using multiagent reinforcement learning(MARL).Our proposed approach is inspired by the multiarmed bandits(MAB)learning technique for multiple agents to cooperate in decision making or to work independently.We study a MAB approach in which defenders visit a system multiple times in an alternating fash-ion to maximize their rewards and protect their system.We find that this game can be modeled from an individual player’s perspective as a restless MAB problem.We discover further results when the MAB takes the form of a pure birth process,such as a myopic optimal policy,as well as providing environments that offer the necessary incentives required for cooperation in multiplayer projects.展开更多
文摘The communication in the Millimeter-wave(mmWave)band,i.e.,30~300 GHz,is characterized by short-range transmissions and the use of antenna beamforming(BF).Thus,multiple mmWave access points(APs)should be installed to fully cover a target environment with gigabits per second(Gbps)connectivity.However,inter-beam interference prevents maximizing the sum rates of the established concurrent links.In this paper,a reinforcement learning(RL)approach is proposed for enabling mmWave concurrent transmissions by finding out beam directions that maximize the long-term average sum rates of the concurrent links.Specifically,the problem is formulated as a multiplayer multiarmed bandit(MAB),where mmWave APs act as the players aiming to maximize their achievable rewards,i.e.,data rates,and the arms to play are the available beam directions.In this setup,a selfish concurrent multiplayer MAB strategy is advocated.Four different MAB algorithms,namely,ϵ-greedy,upper confidence bound(UCB),Thompson sampling(TS),and exponential weight algorithm for exploration and exploitation(EXP3)are examined by employing them in each AP to selfishly enhance its beam selection based only on its previous observations.After a few rounds of interactions,mmWave APs learn how to select concurrent beams that enhance the overall system performance.The proposed MAB based mmWave concurrent BF shows comparable performance to the optimal solution.
基金This work is funded by the Deanship of Scientific Research(DSR)the University of Jeddah,under Grant No.(UJ-22-DR-1).
文摘The static nature of cyber defense systems gives attackers a sufficient amount of time to explore and further exploit the vulnerabilities of information technology systems.In this paper,we investigate a problem where multiagent sys-tems sensing and acting in an environment contribute to adaptive cyber defense.We present a learning strategy that enables multiple agents to learn optimal poli-cies using multiagent reinforcement learning(MARL).Our proposed approach is inspired by the multiarmed bandits(MAB)learning technique for multiple agents to cooperate in decision making or to work independently.We study a MAB approach in which defenders visit a system multiple times in an alternating fash-ion to maximize their rewards and protect their system.We find that this game can be modeled from an individual player’s perspective as a restless MAB problem.We discover further results when the MAB takes the form of a pure birth process,such as a myopic optimal policy,as well as providing environments that offer the necessary incentives required for cooperation in multiplayer projects.