Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environmen...Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.展开更多
To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q...To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.展开更多
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi...The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.展开更多
The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent comm...The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent communication occurs between network switches and the controller,which manages and directs traffic flows.If the controller is not strategically placed within the network,this communication can experience increased delays,negatively affecting network performance.Specifically,an improperly placed controller can lead to higher end-to-end(E2E)delay,as switches must traverse more hops or encounter greater propagation delays when communicating with the controller.This paper introduces a novel approach using Deep Q-Learning(DQL)to dynamically place controllers in Software-Defined Internet of Things(SD-IoT)environments,with the goal of minimizing E2E delay between switches and controllers.E2E delay,a crucial metric for network performance,is influenced by two key factors:hop count,which measures the number of network nodes data must traverse,and propagation delay,which accounts for the physical distance between nodes.Our approach models the controller placement problem as a Markov Decision Process(MDP).In this model,the network configuration at any given time is represented as a“state,”while“actions”correspond to potential decisions regarding the placement of controllers or the reassignment of switches to controllers.Using a Deep Q-Network(DQN)to approximate the Q-function,the system learns the optimal controller placement by maximizing the cumulative reward,which is defined as the negative of the E2E delay.Essentially,the lower the delay,the higher the reward the system receives,enabling it to continuously improve its controller placement strategy.The experimental results show that our DQL-based method significantly reduces E2E delay when compared to traditional benchmark placement strategies.By dynamically learning from the network’s real-time conditions,the proposed method ensures that controller placement remains efficient and responsive,reducing communication delays and enhancing overall network performance.展开更多
Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collabo...Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collaborations with edge servers and vehicular fog servers(VFSs).However,the optimization of task offloading in highly dynamic collaborative vehicular networks faces several challenges such as URLLC guaranteeing,incomplete information,and dimensionality curse.In this paper,we first characterize URLLC in terms of queuing delay bound violation and high-order statistics of excess backlogs.Then,a Deep Reinforcement lEarning-based URLLCAware task offloading algorithM named DREAM is proposed to maximize the throughput of the UVs while satisfying the URLLC constraints in a besteffort way.Compared with existing task offloading algorithms,DREAM achieves superior performance in throughput,queuing delay,and URLLC.展开更多
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ...A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.展开更多
In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role ...In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role in enhancing the effectiveness,efficiency,and reliability of robotic systems.This paper presents a novel approach to optimizing robotic arm grasping strategies based on deep reinforcement learning(DRL).Through the utilization of advanced DRL algorithms,such as Q-Learning,Deep Q-Networks(DQN),Policy Gradient Methods,and Proximal Policy Optimization(PPO),the study aims to improve the performance of robotic arms in grasping objects with varying shapes,sizes,and environmental conditions.The paper provides a detailed analysis of the various deep reinforcement learning methods used for grasping strategy optimization,emphasizing the strengths and weaknesses of each algorithm.It also presents a comprehensive framework for training the DRL models,including simulation environment setup,the optimization process,and the evaluation metrics for grasping success.The results demonstrate that the proposed approach significantly enhances the accuracy and stability of the robotic arm in performing grasping tasks.The study further explores the challenges in training deep reinforcement learning models for real-time robotic applications and offers solutions for improving the efficiency and reliability of grasping strategies.展开更多
In this study,a solution based on deep Q network(DQN)is proposed to address the relay selection problem in cooperative non-orthogonal multiple access(NOMA)systems.DQN is particularly effective in addressing problems w...In this study,a solution based on deep Q network(DQN)is proposed to address the relay selection problem in cooperative non-orthogonal multiple access(NOMA)systems.DQN is particularly effective in addressing problems within dynamic and complex communication environ-ments.By formulating the relay selection problem as a Markov decision process(MDP),the DQN algorithm employs deep neural networks(DNNs)to learn and make decisions through real-time interactions with the communication environment,aiming to minimize the system’s outage proba-bility.During the learning process,the DQN algorithm progressively acquires channel state infor-mation(CSI)between two nodes,thereby minimizing the system’s outage probability until a sta-ble level is reached.Simulation results show that the proposed method effectively reduces the out-age probability by 82%compared to the two-way relay selection scheme(Two-Way)when the sig-nal-to-noise ratio(SNR)is 30 dB.This study demonstrates the applicability and advantages of the DQN algorithm in cooperative NOMA systems,providing a novel approach to addressing real-time relay selection challenges in dynamic communication environments.展开更多
针对深度Q网络(DQN)算法因过估计导致收敛稳定性差的问题,在传统时序差分(TD)的基础上提出N阶TD误差的概念,设计基于二阶TD误差的双网络DQN算法。构造基于二阶TD误差的值函数更新公式,同时结合DQN算法建立双网络模型,得到两个同构的值...针对深度Q网络(DQN)算法因过估计导致收敛稳定性差的问题,在传统时序差分(TD)的基础上提出N阶TD误差的概念,设计基于二阶TD误差的双网络DQN算法。构造基于二阶TD误差的值函数更新公式,同时结合DQN算法建立双网络模型,得到两个同构的值函数网络分别用于表示先后两轮的值函数,协同更新网络参数,以提高DQN算法中值函数估计的稳定性。基于Open AI Gym平台的实验结果表明,在解决Mountain Car和Cart Pole问题方面,该算法较经典DQN算法具有更好的收敛稳定性。展开更多
基金Authors would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work under Project No.g01/n04.
文摘Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.
基金This work was supported by the Fundamental Research Funds for the Central Universities of China under grant no.PA2019GDQT0012by National Natural Science Foundation of China(Grant No.61971176)by the Applied Basic Research Program ofWuhan City,China,under grand 2017010201010117.
文摘To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.
基金supported by the Universiti Tunku Abdul Rahman (UTAR) Malaysia under UTARRF (IPSR/RMC/UTARRF/2021-C1/T05)
文摘The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.
基金supported by the Researcher Supporting Project number(RSPD2024R582),King Saud University,Riyadh,Saudi Arabia.
文摘The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent communication occurs between network switches and the controller,which manages and directs traffic flows.If the controller is not strategically placed within the network,this communication can experience increased delays,negatively affecting network performance.Specifically,an improperly placed controller can lead to higher end-to-end(E2E)delay,as switches must traverse more hops or encounter greater propagation delays when communicating with the controller.This paper introduces a novel approach using Deep Q-Learning(DQL)to dynamically place controllers in Software-Defined Internet of Things(SD-IoT)environments,with the goal of minimizing E2E delay between switches and controllers.E2E delay,a crucial metric for network performance,is influenced by two key factors:hop count,which measures the number of network nodes data must traverse,and propagation delay,which accounts for the physical distance between nodes.Our approach models the controller placement problem as a Markov Decision Process(MDP).In this model,the network configuration at any given time is represented as a“state,”while“actions”correspond to potential decisions regarding the placement of controllers or the reassignment of switches to controllers.Using a Deep Q-Network(DQN)to approximate the Q-function,the system learns the optimal controller placement by maximizing the cumulative reward,which is defined as the negative of the E2E delay.Essentially,the lower the delay,the higher the reward the system receives,enabling it to continuously improve its controller placement strategy.The experimental results show that our DQL-based method significantly reduces E2E delay when compared to traditional benchmark placement strategies.By dynamically learning from the network’s real-time conditions,the proposed method ensures that controller placement remains efficient and responsive,reducing communication delays and enhancing overall network performance.
基金This work was partially supported by the Open Funding of the Shaanxi Key Laboratory of Intelligent Processing for Big Energy Data under Grant Number IPBED3supported by the National Natural Science Foundation of China(NSFC)under Grant Number 61971189supported by the Fundamental Research Funds for the Central Universities under Grant Number 2020MS001.
文摘Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collaborations with edge servers and vehicular fog servers(VFSs).However,the optimization of task offloading in highly dynamic collaborative vehicular networks faces several challenges such as URLLC guaranteeing,incomplete information,and dimensionality curse.In this paper,we first characterize URLLC in terms of queuing delay bound violation and high-order statistics of excess backlogs.Then,a Deep Reinforcement lEarning-based URLLCAware task offloading algorithM named DREAM is proposed to maximize the throughput of the UVs while satisfying the URLLC constraints in a besteffort way.Compared with existing task offloading algorithms,DREAM achieves superior performance in throughput,queuing delay,and URLLC.
基金Supported by the National Ministries and Research Funds(3020020221111)
文摘A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.
文摘In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role in enhancing the effectiveness,efficiency,and reliability of robotic systems.This paper presents a novel approach to optimizing robotic arm grasping strategies based on deep reinforcement learning(DRL).Through the utilization of advanced DRL algorithms,such as Q-Learning,Deep Q-Networks(DQN),Policy Gradient Methods,and Proximal Policy Optimization(PPO),the study aims to improve the performance of robotic arms in grasping objects with varying shapes,sizes,and environmental conditions.The paper provides a detailed analysis of the various deep reinforcement learning methods used for grasping strategy optimization,emphasizing the strengths and weaknesses of each algorithm.It also presents a comprehensive framework for training the DRL models,including simulation environment setup,the optimization process,and the evaluation metrics for grasping success.The results demonstrate that the proposed approach significantly enhances the accuracy and stability of the robotic arm in performing grasping tasks.The study further explores the challenges in training deep reinforcement learning models for real-time robotic applications and offers solutions for improving the efficiency and reliability of grasping strategies.
基金supported by the National Natural Science Foundation of China(Nos.61841107 and 62061024)Gansu Natural Sci-ence Foundation(Nos.22JR5RA274 and 23YFGA0062)Gansu Innovation Foundation(No.2022A-215).
文摘In this study,a solution based on deep Q network(DQN)is proposed to address the relay selection problem in cooperative non-orthogonal multiple access(NOMA)systems.DQN is particularly effective in addressing problems within dynamic and complex communication environ-ments.By formulating the relay selection problem as a Markov decision process(MDP),the DQN algorithm employs deep neural networks(DNNs)to learn and make decisions through real-time interactions with the communication environment,aiming to minimize the system’s outage proba-bility.During the learning process,the DQN algorithm progressively acquires channel state infor-mation(CSI)between two nodes,thereby minimizing the system’s outage probability until a sta-ble level is reached.Simulation results show that the proposed method effectively reduces the out-age probability by 82%compared to the two-way relay selection scheme(Two-Way)when the sig-nal-to-noise ratio(SNR)is 30 dB.This study demonstrates the applicability and advantages of the DQN algorithm in cooperative NOMA systems,providing a novel approach to addressing real-time relay selection challenges in dynamic communication environments.
文摘针对深度Q网络(DQN)算法因过估计导致收敛稳定性差的问题,在传统时序差分(TD)的基础上提出N阶TD误差的概念,设计基于二阶TD误差的双网络DQN算法。构造基于二阶TD误差的值函数更新公式,同时结合DQN算法建立双网络模型,得到两个同构的值函数网络分别用于表示先后两轮的值函数,协同更新网络参数,以提高DQN算法中值函数估计的稳定性。基于Open AI Gym平台的实验结果表明,在解决Mountain Car和Cart Pole问题方面,该算法较经典DQN算法具有更好的收敛稳定性。