The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent comm...The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent communication occurs between network switches and the controller,which manages and directs traffic flows.If the controller is not strategically placed within the network,this communication can experience increased delays,negatively affecting network performance.Specifically,an improperly placed controller can lead to higher end-to-end(E2E)delay,as switches must traverse more hops or encounter greater propagation delays when communicating with the controller.This paper introduces a novel approach using Deep Q-Learning(DQL)to dynamically place controllers in Software-Defined Internet of Things(SD-IoT)environments,with the goal of minimizing E2E delay between switches and controllers.E2E delay,a crucial metric for network performance,is influenced by two key factors:hop count,which measures the number of network nodes data must traverse,and propagation delay,which accounts for the physical distance between nodes.Our approach models the controller placement problem as a Markov Decision Process(MDP).In this model,the network configuration at any given time is represented as a“state,”while“actions”correspond to potential decisions regarding the placement of controllers or the reassignment of switches to controllers.Using a Deep Q-Network(DQN)to approximate the Q-function,the system learns the optimal controller placement by maximizing the cumulative reward,which is defined as the negative of the E2E delay.Essentially,the lower the delay,the higher the reward the system receives,enabling it to continuously improve its controller placement strategy.The experimental results show that our DQL-based method significantly reduces E2E delay when compared to traditional benchmark placement strategies.By dynamically learning from the network’s real-time conditions,the proposed method ensures that controller placement remains efficient and responsive,reducing communication delays and enhancing overall network performance.展开更多
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay ...Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.展开更多
To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q...To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.展开更多
Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environmen...Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.展开更多
Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynami...Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynamic environments.To reduce the overhead cost,we propose a multi-user beam tracking algorithm using a distributed deep Q-learning method.With online learning of users’moving trajectories,the proposed algorithm learns to scan a beam subspace to maximize the average effective sum rate.Considering practical implementation,we model the continuous beam tracking problem as a non-Markov decision process and thus develop a simplified training scheme of deep Q-learning to reduce the training complexity.Furthermore,we propose a scalable state-action-reward design for scenarios with different users and antenna numbers.Simulation results verify the effectiveness of the designed method.展开更多
To reduce the transmission latency and mitigate the backhaul burden of the centralized cloud-based network services,the mobile edge computing(MEC)has been drawing increased attention from both industry and academia re...To reduce the transmission latency and mitigate the backhaul burden of the centralized cloud-based network services,the mobile edge computing(MEC)has been drawing increased attention from both industry and academia recently.This paper focuses on mobile users’computation offloading problem in wireless cellular networks with mobile edge computing for the purpose of optimizing the computation offloading decision making policy.Since wireless network states and computing requests have stochastic properties and the environment’s dynamics are unknown,we use the modelfree reinforcement learning(RL)framework to formulate and tackle the computation offloading problem.Each mobile user learns through interactions with the environment and the estimate of its performance in the form of value function,then it chooses the overhead-aware optimal computation offloading action(local computing or edge computing)based on its state.The state spaces are high-dimensional in our work and value function is unrealistic to estimate.Consequently,we use deep reinforcement learning algorithm,which combines RL method Q-learning with the deep neural network(DNN)to approximate the value functions for complicated control applications,and the optimal policy will be obtained when the value function reaches convergence.Simulation results showed that the effectiveness of the proposed method in comparison with baseline methods in terms of total overheads of all mobile users.展开更多
Currently,edge Artificial Intelligence(AI)systems have significantly facilitated the functionalities of intelligent devices such as smartphones and smart cars,and supported diverse applications and services.This funda...Currently,edge Artificial Intelligence(AI)systems have significantly facilitated the functionalities of intelligent devices such as smartphones and smart cars,and supported diverse applications and services.This fundamental supports come from continuous data analysis and computation over these devices.Considering the resource constraints of terminal devices,multi-layer edge artificial intelligence systems improve the overall computing power of the system by scheduling computing tasks to edge and cloud servers for execution.Previous efforts tend to ignore the nature of strong pipelined characteristics of processing tasks in edge AI systems,such as the encryption,decryption and consensus algorithm supporting the implementation of Blockchain techniques.Therefore,this paper proposes a new pipelined task scheduling algorithm(referred to as PTS-RDQN),which utilizes the system representation ability of deep reinforcement learning and integrates multiple dimensional information to achieve global task scheduling.Specifically,a co-optimization strategy based on Rainbow Deep Q-Learning(RainbowDQN)is proposed to allocate computation tasks for mobile devices,edge and cloud servers,which is able to comprehensively consider the balance of task turnaround time,link quality,and other factors,thus effectively improving system performance and user experience.In addition,a task scheduling strategy based on PTS-RDQN is proposed,which is capable of realizing dynamic task allocation according to device load.The results based on many simulation experiments show that the proposed method can effectively improve the resource utilization,and provide an effective task scheduling strategy for the edge computing system with cloud-edge-end architecture.展开更多
针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数...针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。展开更多
基金supported by the Researcher Supporting Project number(RSPD2024R582),King Saud University,Riyadh,Saudi Arabia.
文摘The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent communication occurs between network switches and the controller,which manages and directs traffic flows.If the controller is not strategically placed within the network,this communication can experience increased delays,negatively affecting network performance.Specifically,an improperly placed controller can lead to higher end-to-end(E2E)delay,as switches must traverse more hops or encounter greater propagation delays when communicating with the controller.This paper introduces a novel approach using Deep Q-Learning(DQL)to dynamically place controllers in Software-Defined Internet of Things(SD-IoT)environments,with the goal of minimizing E2E delay between switches and controllers.E2E delay,a crucial metric for network performance,is influenced by two key factors:hop count,which measures the number of network nodes data must traverse,and propagation delay,which accounts for the physical distance between nodes.Our approach models the controller placement problem as a Markov Decision Process(MDP).In this model,the network configuration at any given time is represented as a“state,”while“actions”correspond to potential decisions regarding the placement of controllers or the reassignment of switches to controllers.Using a Deep Q-Network(DQN)to approximate the Q-function,the system learns the optimal controller placement by maximizing the cumulative reward,which is defined as the negative of the E2E delay.Essentially,the lower the delay,the higher the reward the system receives,enabling it to continuously improve its controller placement strategy.The experimental results show that our DQL-based method significantly reduces E2E delay when compared to traditional benchmark placement strategies.By dynamically learning from the network’s real-time conditions,the proposed method ensures that controller placement remains efficient and responsive,reducing communication delays and enhancing overall network performance.
基金supported by the National Natural Science Foundation of China(61751210,61572441)。
文摘Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
基金This work was supported by the Fundamental Research Funds for the Central Universities of China under grant no.PA2019GDQT0012by National Natural Science Foundation of China(Grant No.61971176)by the Applied Basic Research Program ofWuhan City,China,under grand 2017010201010117.
文摘To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.
基金Authors would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work under Project No.g01/n04.
文摘Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.
文摘Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynamic environments.To reduce the overhead cost,we propose a multi-user beam tracking algorithm using a distributed deep Q-learning method.With online learning of users’moving trajectories,the proposed algorithm learns to scan a beam subspace to maximize the average effective sum rate.Considering practical implementation,we model the continuous beam tracking problem as a non-Markov decision process and thus develop a simplified training scheme of deep Q-learning to reduce the training complexity.Furthermore,we propose a scalable state-action-reward design for scenarios with different users and antenna numbers.Simulation results verify the effectiveness of the designed method.
基金This work was supported by the National Natural Science Foundation of China(61571059 and 61871058).
文摘To reduce the transmission latency and mitigate the backhaul burden of the centralized cloud-based network services,the mobile edge computing(MEC)has been drawing increased attention from both industry and academia recently.This paper focuses on mobile users’computation offloading problem in wireless cellular networks with mobile edge computing for the purpose of optimizing the computation offloading decision making policy.Since wireless network states and computing requests have stochastic properties and the environment’s dynamics are unknown,we use the modelfree reinforcement learning(RL)framework to formulate and tackle the computation offloading problem.Each mobile user learns through interactions with the environment and the estimate of its performance in the form of value function,then it chooses the overhead-aware optimal computation offloading action(local computing or edge computing)based on its state.The state spaces are high-dimensional in our work and value function is unrealistic to estimate.Consequently,we use deep reinforcement learning algorithm,which combines RL method Q-learning with the deep neural network(DNN)to approximate the value functions for complicated control applications,and the optimal policy will be obtained when the value function reaches convergence.Simulation results showed that the effectiveness of the proposed method in comparison with baseline methods in terms of total overheads of all mobile users.
文摘Currently,edge Artificial Intelligence(AI)systems have significantly facilitated the functionalities of intelligent devices such as smartphones and smart cars,and supported diverse applications and services.This fundamental supports come from continuous data analysis and computation over these devices.Considering the resource constraints of terminal devices,multi-layer edge artificial intelligence systems improve the overall computing power of the system by scheduling computing tasks to edge and cloud servers for execution.Previous efforts tend to ignore the nature of strong pipelined characteristics of processing tasks in edge AI systems,such as the encryption,decryption and consensus algorithm supporting the implementation of Blockchain techniques.Therefore,this paper proposes a new pipelined task scheduling algorithm(referred to as PTS-RDQN),which utilizes the system representation ability of deep reinforcement learning and integrates multiple dimensional information to achieve global task scheduling.Specifically,a co-optimization strategy based on Rainbow Deep Q-Learning(RainbowDQN)is proposed to allocate computation tasks for mobile devices,edge and cloud servers,which is able to comprehensively consider the balance of task turnaround time,link quality,and other factors,thus effectively improving system performance and user experience.In addition,a task scheduling strategy based on PTS-RDQN is proposed,which is capable of realizing dynamic task allocation according to device load.The results based on many simulation experiments show that the proposed method can effectively improve the resource utilization,and provide an effective task scheduling strategy for the edge computing system with cloud-edge-end architecture.
文摘针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。