In Software-Defined Networks(SDNs),determining how to efficiently achieve Quality of Service(QoS)-aware routing is challenging but critical for significantly improving the performance of a network,where the metrics of...In Software-Defined Networks(SDNs),determining how to efficiently achieve Quality of Service(QoS)-aware routing is challenging but critical for significantly improving the performance of a network,where the metrics of QoS can be defined as,for example,average latency,packet loss ratio,and throughput.The SDN controller can use network statistics and a Deep Reinforcement Learning(DRL)method to resolve this challenge.In this paper,we formulate dynamic routing in an SDN as a Markov decision process and propose a DRL algorithm called the Asynchronous Advantage Actor-Critic QoS-aware Routing Optimization Mechanism(AQROM)to determine routing strategies that balance the traffic loads in the network.AQROM can improve the QoS of the network and reduce the training time via dynamic routing strategy updates;that is,the reward function can be dynamically and promptly altered based on the optimization objective regardless of the network topology and traffic pattern.AQROM can be considered as one-step optimization and a black-box routing mechanism in high-dimensional input and output sets for both discrete and continuous states,and actions with respect to the operations in the SDN.Extensive simulations were conducted using OMNeT++and the results demonstrated that AQROM 1)achieved much faster and stable convergence than the Deep Deterministic Policy Gradient(DDPG)and Advantage Actor-Critic(A2C),2)incurred a lower packet loss ratio and latency than Open Shortest Path First(OSPF),DDPG,and A2C,and 3)resulted in higher and more stable throughput than OSPF,DDPG,and A2C.展开更多
Reinforcement learning as autonomous learning is greatly driving artificial intelligence(AI)development to practical applications.Having demonstrated the potential to significantly improve synchronously parallel learn...Reinforcement learning as autonomous learning is greatly driving artificial intelligence(AI)development to practical applications.Having demonstrated the potential to significantly improve synchronously parallel learning,the parallel computing based asynchronous advantage actor-critic(A3C)opens a new door for reinforcement learning.Unfortunately,the acceleration's influence on A3C robustness has been largely overlooked.In this paper,we perform the first robustness assessment of A3C based on parallel computing.By perceiving the policy's action,we construct a global matrix of action probability deviation and define two novel measures of skewness and sparseness to form an integral robustness measure.Based on such static assessment,we then develop a dynamic robustness assessing algorithm through situational whole-space state sampling of changing episodes.Extensive experiments with different combinations of agent number and learning rate are implemented on an A3C-based pathfinding application,demonstrating that our proposed robustness assessment can effectively measure the robustness of A3C,which can achieve an accuracy of 83.3%.展开更多
Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their du...Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their duplex modes to increase the link spectral efficiency.Then we formulate a joint flexible duplexing and power allocation problem to balance the user fairness and system spectral efficiency.We further transform the problem into a probability optimization to accommodate the shortterm communications.In contrast with the instant performance optimization,the probability optimization belongs to a sequential decision making problem,and thus we reformulate it as a Markov Decision Process(MDP).We utilizes deep reinforcement learning(DRL)algorithm to search the solution from a large state-action space,and propose an asynchronous advantage actor-critic(A3C)-based scheme to reduce the chance of converging to the suboptimal policy.Simulation results demonstrate that the A3C-based scheme is superior to the baseline schemes in term of the complexity,accumulated log spectral efficiency,and stability.展开更多
Non-Orthogonal Multiple Access(NOMA)assisted Unmanned Aerial Vehicle(UAV)communication is becoming a promising technique for future B5G/6G networks.However,the security of the NOMA-UAV networks remains critical challe...Non-Orthogonal Multiple Access(NOMA)assisted Unmanned Aerial Vehicle(UAV)communication is becoming a promising technique for future B5G/6G networks.However,the security of the NOMA-UAV networks remains critical challenges due to the shared wireless spectrum and Line-of-Sight(LoS)channel.This paper formulates a joint UAV trajectory design and power allocation problem with the aid of the ground jammer to maximize the sum secrecy rate.First,the joint optimization problem is modeled as a Markov Decision Process(MDP).Then,the Deep Reinforcement Learning(DRL)method is utilized to search the optimal policy from the continuous action space.In order to accelerate the sample accumulation,the Asynchronous Advantage Actor-Critic(A3C)scheme with multiple workers is proposed,which reformulates the action and reward to acquire complete update duration.Simulation results demonstrate that the A3C-based scheme outperforms the baseline schemes in term of the secrecy rate and stability.展开更多
基金fully supported by GUET Excellent Graduate Thesis Program(Grant No.19YJPYBS03)Innovation Project of Guangxi Graduate Education(Grant No.YCBZ2022109)New Technology Research University Cooperation Project of the 34th Research Institute of China Electronics Technology Group Corporation,2021(Grant No.SF2126007)。
文摘In Software-Defined Networks(SDNs),determining how to efficiently achieve Quality of Service(QoS)-aware routing is challenging but critical for significantly improving the performance of a network,where the metrics of QoS can be defined as,for example,average latency,packet loss ratio,and throughput.The SDN controller can use network statistics and a Deep Reinforcement Learning(DRL)method to resolve this challenge.In this paper,we formulate dynamic routing in an SDN as a Markov decision process and propose a DRL algorithm called the Asynchronous Advantage Actor-Critic QoS-aware Routing Optimization Mechanism(AQROM)to determine routing strategies that balance the traffic loads in the network.AQROM can improve the QoS of the network and reduce the training time via dynamic routing strategy updates;that is,the reward function can be dynamically and promptly altered based on the optimization objective regardless of the network topology and traffic pattern.AQROM can be considered as one-step optimization and a black-box routing mechanism in high-dimensional input and output sets for both discrete and continuous states,and actions with respect to the operations in the SDN.Extensive simulations were conducted using OMNeT++and the results demonstrated that AQROM 1)achieved much faster and stable convergence than the Deep Deterministic Policy Gradient(DDPG)and Advantage Actor-Critic(A2C),2)incurred a lower packet loss ratio and latency than Open Shortest Path First(OSPF),DDPG,and A2C,and 3)resulted in higher and more stable throughput than OSPF,DDPG,and A2C.
基金supported by the National Natural Science Foundation of China under Grant Nos.61972025,61802389,61672092,U1811264,and 61966009the National Key Research and Development Program of China under Grant Nos.2020YFB1005604 and 2020YFB2103802Guangxi Key Laboratory of Trusted Software under Grant No.KX201902.
文摘Reinforcement learning as autonomous learning is greatly driving artificial intelligence(AI)development to practical applications.Having demonstrated the potential to significantly improve synchronously parallel learning,the parallel computing based asynchronous advantage actor-critic(A3C)opens a new door for reinforcement learning.Unfortunately,the acceleration's influence on A3C robustness has been largely overlooked.In this paper,we perform the first robustness assessment of A3C based on parallel computing.By perceiving the policy's action,we construct a global matrix of action probability deviation and define two novel measures of skewness and sparseness to form an integral robustness measure.Based on such static assessment,we then develop a dynamic robustness assessing algorithm through situational whole-space state sampling of changing episodes.Extensive experiments with different combinations of agent number and learning rate are implemented on an A3C-based pathfinding application,demonstrating that our proposed robustness assessment can effectively measure the robustness of A3C,which can achieve an accuracy of 83.3%.
基金supported by the National Key R&D Program of China under Grant 2020YFB1807204the BUPT Excellent Ph.D.Students Foundation under Grant CX2022306。
文摘Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their duplex modes to increase the link spectral efficiency.Then we formulate a joint flexible duplexing and power allocation problem to balance the user fairness and system spectral efficiency.We further transform the problem into a probability optimization to accommodate the shortterm communications.In contrast with the instant performance optimization,the probability optimization belongs to a sequential decision making problem,and thus we reformulate it as a Markov Decision Process(MDP).We utilizes deep reinforcement learning(DRL)algorithm to search the solution from a large state-action space,and propose an asynchronous advantage actor-critic(A3C)-based scheme to reduce the chance of converging to the suboptimal policy.Simulation results demonstrate that the A3C-based scheme is superior to the baseline schemes in term of the complexity,accumulated log spectral efficiency,and stability.
基金supported by the Fundamental Research Funds for the Central Universities,China(No.2024MS115).
文摘Non-Orthogonal Multiple Access(NOMA)assisted Unmanned Aerial Vehicle(UAV)communication is becoming a promising technique for future B5G/6G networks.However,the security of the NOMA-UAV networks remains critical challenges due to the shared wireless spectrum and Line-of-Sight(LoS)channel.This paper formulates a joint UAV trajectory design and power allocation problem with the aid of the ground jammer to maximize the sum secrecy rate.First,the joint optimization problem is modeled as a Markov Decision Process(MDP).Then,the Deep Reinforcement Learning(DRL)method is utilized to search the optimal policy from the continuous action space.In order to accelerate the sample accumulation,the Asynchronous Advantage Actor-Critic(A3C)scheme with multiple workers is proposed,which reformulates the action and reward to acquire complete update duration.Simulation results demonstrate that the A3C-based scheme outperforms the baseline schemes in term of the secrecy rate and stability.