By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning...By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.展开更多
In this study,a solution based on deep Q network(DQN)is proposed to address the relay selection problem in cooperative non-orthogonal multiple access(NOMA)systems.DQN is particularly effective in addressing problems w...In this study,a solution based on deep Q network(DQN)is proposed to address the relay selection problem in cooperative non-orthogonal multiple access(NOMA)systems.DQN is particularly effective in addressing problems within dynamic and complex communication environ-ments.By formulating the relay selection problem as a Markov decision process(MDP),the DQN algorithm employs deep neural networks(DNNs)to learn and make decisions through real-time interactions with the communication environment,aiming to minimize the system’s outage proba-bility.During the learning process,the DQN algorithm progressively acquires channel state infor-mation(CSI)between two nodes,thereby minimizing the system’s outage probability until a sta-ble level is reached.Simulation results show that the proposed method effectively reduces the out-age probability by 82%compared to the two-way relay selection scheme(Two-Way)when the sig-nal-to-noise ratio(SNR)is 30 dB.This study demonstrates the applicability and advantages of the DQN algorithm in cooperative NOMA systems,providing a novel approach to addressing real-time relay selection challenges in dynamic communication environments.展开更多
Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a st...Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.展开更多
为了解决大型工程项目中文件的传输时间与成本问题,提出一个基于文件工作流的工程项目文件管理优化方法。首先,构建了工程项目文件管理环境和具有逻辑顺序的文件工作流模型,分析了文件的传输和缓存。在此基础上,将文件管理优化问题建模...为了解决大型工程项目中文件的传输时间与成本问题,提出一个基于文件工作流的工程项目文件管理优化方法。首先,构建了工程项目文件管理环境和具有逻辑顺序的文件工作流模型,分析了文件的传输和缓存。在此基础上,将文件管理优化问题建模为马尔可夫过程,通过设计状态空间、动作空间及奖励函数等实现文件工作流的任务完成时间与缓存成本的联合优化。其次,采用对抗式双重深度Q网络(dueling double deep Q network,D3QN)来降低训练时间,提高训练效率。仿真结果验证了提出方案在不同参数配置下文件传输的有效性,并且在任务体量增大时仍能保持较好的优化能力。展开更多
The main aim of future mobile networks is to provide secure,reliable,intelligent,and seamless connectivity.It also enables mobile network operators to ensure their customer’s a better quality of service(QoS).Nowadays...The main aim of future mobile networks is to provide secure,reliable,intelligent,and seamless connectivity.It also enables mobile network operators to ensure their customer’s a better quality of service(QoS).Nowadays,Unmanned Aerial Vehicles(UAVs)are a significant part of the mobile network due to their continuously growing use in various applications.For better coverage,cost-effective,and seamless service connectivity and provisioning,UAVs have emerged as the best choice for telco operators.UAVs can be used as flying base stations,edge servers,and relay nodes in mobile networks.On the other side,Multi-access EdgeComputing(MEC)technology also emerged in the 5G network to provide a better quality of experience(QoE)to users with different QoS requirements.However,UAVs in a mobile network for coverage enhancement and better QoS face several challenges such as trajectory designing,path planning,optimization,QoS assurance,mobilitymanagement,etc.The efficient and proactive path planning and optimization in a highly dynamic environment containing buildings and obstacles are challenging.So,an automated Artificial Intelligence(AI)enabled QoSaware solution is needed for trajectory planning and optimization.Therefore,this work introduces a well-designed AI and MEC-enabled architecture for a UAVs-assisted future network.It has an efficient Deep Reinforcement Learning(DRL)algorithm for real-time and proactive trajectory planning and optimization.It also fulfills QoS-aware service provisioning.A greedypolicy approach is used to maximize the long-term reward for serving more users withQoS.Simulation results reveal the superiority of the proposed DRL mechanism for energy-efficient and QoS-aware trajectory planning over the existing models.展开更多
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how...Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how to achieve UAVenabled data dissemination and also ensure safe navigation synchronously is a new challenge. In this paper, our goal is minimizing the whole weighted sum of the UAV’s task completion time while satisfying the data transmission task requirement and the UAV’s feasible flight region constraints. However, it is unable to be solved via standard optimization methods mainly on account of lacking a tractable and accurate system model in practice. To overcome this tough issue,we propose a new solution approach by utilizing the most advanced dueling double deep Q network(dueling DDQN) with multi-step learning. Specifically, to improve the algorithm, the extra labels are added to the primitive states. Simulation results indicate the validity and performance superiority of the proposed algorithm under different data thresholds compared with two other benchmarks.展开更多
Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical ...Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.展开更多
To address the shortcomings of single-step decision making in the existing deep reinforcement learning based unmanned aerial vehicle(UAV)real-time path planning problem,a real-time UAV path planning algorithm based on...To address the shortcomings of single-step decision making in the existing deep reinforcement learning based unmanned aerial vehicle(UAV)real-time path planning problem,a real-time UAV path planning algorithm based on long shortterm memory(RPP-LSTM)network is proposed,which combines the memory characteristics of recurrent neural network(RNN)and the deep reinforcement learning algorithm.LSTM networks are used in this algorithm as Q-value networks for the deep Q network(DQN)algorithm,which makes the decision of the Q-value network has some memory.Thanks to LSTM network,the Q-value network can use the previous environmental information and action information which effectively avoids the problem of single-step decision considering only the current environment.Besides,the algorithm proposes a hierarchical reward and punishment function for the specific problem of UAV real-time path planning,so that the UAV can more reasonably perform path planning.Simulation verification shows that compared with the traditional feed-forward neural network(FNN)based UAV autonomous path planning algorithm,the RPP-LSTM proposed in this paper can adapt to more complex environments and has significantly improved robustness and accuracy when performing UAV real-time path planning.展开更多
In order to improve the performance of UAV's autonomous maneuvering decision-making,this paper proposes a decision-making method based on situational continuity.The algorithm in this paper designs a situation eval...In order to improve the performance of UAV's autonomous maneuvering decision-making,this paper proposes a decision-making method based on situational continuity.The algorithm in this paper designs a situation evaluation function with strong guidance,then trains the Long Short-Term Memory(LSTM)under the framework of Deep Q Network(DQN)for air combat maneuvering decision-making.Considering the continuity between adjacent situations,the method takes multiple consecutive situations as one input of the neural network.To reflect the difference between adjacent situations,the method takes the difference of situation evaluation value as the reward of reinforcement learning.In different scenarios,the algorithm proposed in this paper is compared with the algorithm based on the Fully Neural Network(FNN)and the algorithm based on statistical principles respectively.The results show that,compared with the FNN algorithm,the algorithm proposed in this paper is more accurate and forwardlooking.Compared with the algorithm based on the statistical principles,the decision-making of the algorithm proposed in this paper is more efficient and its real-time performance is better.展开更多
The advanced diagnosis of faults in railway point machines is crucial for ensuring the smooth operation of the turnout conversion system and the safe functioning of trains.Signal processing and deep learning-based met...The advanced diagnosis of faults in railway point machines is crucial for ensuring the smooth operation of the turnout conversion system and the safe functioning of trains.Signal processing and deep learning-based methods have been extensively explored in the realm of fault diagnosis.While these approaches effectively extract fault features and facilitate the creation of end-to-end diagnostic models,they often demand considerable expert experience and manual intervention in feature selection,structural construction and parameter optimization of neural networks.This reliance on manual efforts can result in weak generalization performance and a lack of intelligence in the model.To address these challenges,this study introduces an intelligent fault diagnosis method based on deep reinforcement learning(DRL).Initially,a one-dimensional convolutional neural network agent is established,leveraging the specific characteristics of point machine fault data to automatically extract diverse features across multiple scales.Subsequently,deep Q network is incorporated as the central component of the diagnostic framework.The fault classification interactive environment is meticulously designed,and the agent training network is optimized.Through extensive interaction between the agent and the environment using fault data,satisfactory cumulative rewards and effective fault classification strategies are achieved.Experimental results demonstrate the proposed method’s high efficacy,with a training accuracy of 98.9%and a commendable test accuracy of 98.41%.Notably,the utilization of DRL in addressing the fault diagnosis challenge for railway point machines enhances the intelligence of diagnostic process,particularly through its excellent independent exploration capability.展开更多
With the integration of alternative energy and renewables,the issue of stability and resilience of the power network has received considerable attention.The basic necessity for fault diagnosis and isolation is fault i...With the integration of alternative energy and renewables,the issue of stability and resilience of the power network has received considerable attention.The basic necessity for fault diagnosis and isolation is fault identification and location.The conventional intelligent fault identification method needs supervision,manual labelling of characteristics,and requires large amounts of labelled data.To enhance the ability of intelligent methods and get rid of the dependence on a large amount of labelled data,a novel fault identification method based on deep reinforcement learning(DRL),which has not received enough attention in the field of fault identification,is investigated in this paper.The proposed method uses different faults as parameters of the model to expand the scope of fault identification.In addition,the DRL algorithm can intelligently modify the fault parameters according to the observations obtained from the power network environment,rather than requiring manual and mechanical tuning of parameters.The methodology was tested on the IEEE 14 bus for several scenarios and the performance of the proposed method was compared with that of population-based optimization methods and supervised learning methods.The obtained results have confirmed the feasibility and effectiveness of the proposed method.展开更多
Unmanned aerial vehicles(UAVs)have gained much attention from academic and industrial areas due to the significant number of potential applications in urban airspace.A traffic management system for these UAVs is neede...Unmanned aerial vehicles(UAVs)have gained much attention from academic and industrial areas due to the significant number of potential applications in urban airspace.A traffic management system for these UAVs is needed to manage this future traffic.Tactical conflict resolution for unmanned aerial systems(UASs)is an essential piece of the puzzle for the future UAS Traffic Management(UTM),especially in very low-level(VLL)urban airspace.Unlike conflict resolution in higher altitude airspace,the dense high-rise buildings are an essential source of potential conflict to be considered in VLL urban airspace.In this paper,we propose an attention-based deep reinforcement learning approach to solve the tactical conflict resolution problem.Specifically,we formulate this task as a sequential decision-making problem using Markov Decision Process(MDP).The double deep Q network(DDQN)framework is used as a learning framework for the host drone to learn to output conflict-free maneuvers at each time step.We use the attention mechanism to model the individual neighbor's effect on the host drone,endowing the learned conflict resolution policy to be adapted to an arbitrary number of neighboring drones.Lastly,we build a simulation environment with various scenarios covering different types of encounters to evaluate the proposed approach.The simulation results demonstrate that our proposed algorithm provides a reliable solution to minimize secondary conflict counts compared to learning and non-learning-based approaches under different traffic density scenarios.展开更多
With the appearance of a huge number of reusable electronic products,the precise value evaluation has become an urgent problem to be solved in the recycling process.Traditional methods rely on manual intervention most...With the appearance of a huge number of reusable electronic products,the precise value evaluation has become an urgent problem to be solved in the recycling process.Traditional methods rely on manual intervention mostly.In order to make the model more suitable for the dynamic updating,this paper proposes the reinforcement learning based electronic products value prediction model which integrates market information to achieve timely and stable prediction results.The basic attributes and depreciation attributes of the product are modeled by two parallel neural networks separately to learn the different effects for prediction.Most importantly,the double deep Q network is adopted to fuse market information by reinforcement learning strategy,and the training on the old product data can be used to predict the following appeared product,which alleviates the cold start problem.Experiments on the real mobile phone recycling platform data verify that the model has achieved higher accuracy and it has a better generalization ability.展开更多
基金funded by National Natural Science Foundation of China(No.62063006)Guangxi Science and Technology Major Program(No.2022AA05002)+1 种基金Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003)Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).
文摘By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.
基金supported by the National Natural Science Foundation of China(Nos.61841107 and 62061024)Gansu Natural Sci-ence Foundation(Nos.22JR5RA274 and 23YFGA0062)Gansu Innovation Foundation(No.2022A-215).
文摘In this study,a solution based on deep Q network(DQN)is proposed to address the relay selection problem in cooperative non-orthogonal multiple access(NOMA)systems.DQN is particularly effective in addressing problems within dynamic and complex communication environ-ments.By formulating the relay selection problem as a Markov decision process(MDP),the DQN algorithm employs deep neural networks(DNNs)to learn and make decisions through real-time interactions with the communication environment,aiming to minimize the system’s outage proba-bility.During the learning process,the DQN algorithm progressively acquires channel state infor-mation(CSI)between two nodes,thereby minimizing the system’s outage probability until a sta-ble level is reached.Simulation results show that the proposed method effectively reduces the out-age probability by 82%compared to the two-way relay selection scheme(Two-Way)when the sig-nal-to-noise ratio(SNR)is 30 dB.This study demonstrates the applicability and advantages of the DQN algorithm in cooperative NOMA systems,providing a novel approach to addressing real-time relay selection challenges in dynamic communication environments.
文摘Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.
文摘为了解决大型工程项目中文件的传输时间与成本问题,提出一个基于文件工作流的工程项目文件管理优化方法。首先,构建了工程项目文件管理环境和具有逻辑顺序的文件工作流模型,分析了文件的传输和缓存。在此基础上,将文件管理优化问题建模为马尔可夫过程,通过设计状态空间、动作空间及奖励函数等实现文件工作流的任务完成时间与缓存成本的联合优化。其次,采用对抗式双重深度Q网络(dueling double deep Q network,D3QN)来降低训练时间,提高训练效率。仿真结果验证了提出方案在不同参数配置下文件传输的有效性,并且在任务体量增大时仍能保持较好的优化能力。
基金This work was supported by the Fundamental Research Funds for the Central Universities(No.2019XD-A07)the Director Fund of Beijing Key Laboratory of Space-ground Interconnection and Convergencethe National Key Laboratory of Science and Technology on Vacuum Electronics.
文摘The main aim of future mobile networks is to provide secure,reliable,intelligent,and seamless connectivity.It also enables mobile network operators to ensure their customer’s a better quality of service(QoS).Nowadays,Unmanned Aerial Vehicles(UAVs)are a significant part of the mobile network due to their continuously growing use in various applications.For better coverage,cost-effective,and seamless service connectivity and provisioning,UAVs have emerged as the best choice for telco operators.UAVs can be used as flying base stations,edge servers,and relay nodes in mobile networks.On the other side,Multi-access EdgeComputing(MEC)technology also emerged in the 5G network to provide a better quality of experience(QoE)to users with different QoS requirements.However,UAVs in a mobile network for coverage enhancement and better QoS face several challenges such as trajectory designing,path planning,optimization,QoS assurance,mobilitymanagement,etc.The efficient and proactive path planning and optimization in a highly dynamic environment containing buildings and obstacles are challenging.So,an automated Artificial Intelligence(AI)enabled QoSaware solution is needed for trajectory planning and optimization.Therefore,this work introduces a well-designed AI and MEC-enabled architecture for a UAVs-assisted future network.It has an efficient Deep Reinforcement Learning(DRL)algorithm for real-time and proactive trajectory planning and optimization.It also fulfills QoS-aware service provisioning.A greedypolicy approach is used to maximize the long-term reward for serving more users withQoS.Simulation results reveal the superiority of the proposed DRL mechanism for energy-efficient and QoS-aware trajectory planning over the existing models.
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.
基金supported by the National Natural Science Foundation of China (No. 61931011)。
文摘Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how to achieve UAVenabled data dissemination and also ensure safe navigation synchronously is a new challenge. In this paper, our goal is minimizing the whole weighted sum of the UAV’s task completion time while satisfying the data transmission task requirement and the UAV’s feasible flight region constraints. However, it is unable to be solved via standard optimization methods mainly on account of lacking a tractable and accurate system model in practice. To overcome this tough issue,we propose a new solution approach by utilizing the most advanced dueling double deep Q network(dueling DDQN) with multi-step learning. Specifically, to improve the algorithm, the extra labels are added to the primitive states. Simulation results indicate the validity and performance superiority of the proposed algorithm under different data thresholds compared with two other benchmarks.
基金supported by the National Key Research and Development Program of China(No.2021YFE0116900)National Natural Science Foundation of China(Nos.42275157,62002276,and 41975142)Major Program of the National Social Science Fund of China(No.17ZDA092).
文摘Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.
基金supported by the Natural Science Basic Research Prog ram of Shaanxi(2022JQ-593)。
文摘To address the shortcomings of single-step decision making in the existing deep reinforcement learning based unmanned aerial vehicle(UAV)real-time path planning problem,a real-time UAV path planning algorithm based on long shortterm memory(RPP-LSTM)network is proposed,which combines the memory characteristics of recurrent neural network(RNN)and the deep reinforcement learning algorithm.LSTM networks are used in this algorithm as Q-value networks for the deep Q network(DQN)algorithm,which makes the decision of the Q-value network has some memory.Thanks to LSTM network,the Q-value network can use the previous environmental information and action information which effectively avoids the problem of single-step decision considering only the current environment.Besides,the algorithm proposes a hierarchical reward and punishment function for the specific problem of UAV real-time path planning,so that the UAV can more reasonably perform path planning.Simulation verification shows that compared with the traditional feed-forward neural network(FNN)based UAV autonomous path planning algorithm,the RPP-LSTM proposed in this paper can adapt to more complex environments and has significantly improved robustness and accuracy when performing UAV real-time path planning.
基金supported by the Natural Science Basic Research Program of Shaanxi(Program No.2022JQ-593)。
文摘In order to improve the performance of UAV's autonomous maneuvering decision-making,this paper proposes a decision-making method based on situational continuity.The algorithm in this paper designs a situation evaluation function with strong guidance,then trains the Long Short-Term Memory(LSTM)under the framework of Deep Q Network(DQN)for air combat maneuvering decision-making.Considering the continuity between adjacent situations,the method takes multiple consecutive situations as one input of the neural network.To reflect the difference between adjacent situations,the method takes the difference of situation evaluation value as the reward of reinforcement learning.In different scenarios,the algorithm proposed in this paper is compared with the algorithm based on the Fully Neural Network(FNN)and the algorithm based on statistical principles respectively.The results show that,compared with the FNN algorithm,the algorithm proposed in this paper is more accurate and forwardlooking.Compared with the algorithm based on the statistical principles,the decision-making of the algorithm proposed in this paper is more efficient and its real-time performance is better.
基金supported by the Transportation Science and Technology Project of the Liaoning Provincial Department of Education(Grant No.202243)the Provincial Key Laboratory Project(Grant No.GJZZX2022KF05)the Natural Science Foundation of Liaoning Province(Grant No.2019-ZD-0094).
文摘The advanced diagnosis of faults in railway point machines is crucial for ensuring the smooth operation of the turnout conversion system and the safe functioning of trains.Signal processing and deep learning-based methods have been extensively explored in the realm of fault diagnosis.While these approaches effectively extract fault features and facilitate the creation of end-to-end diagnostic models,they often demand considerable expert experience and manual intervention in feature selection,structural construction and parameter optimization of neural networks.This reliance on manual efforts can result in weak generalization performance and a lack of intelligence in the model.To address these challenges,this study introduces an intelligent fault diagnosis method based on deep reinforcement learning(DRL).Initially,a one-dimensional convolutional neural network agent is established,leveraging the specific characteristics of point machine fault data to automatically extract diverse features across multiple scales.Subsequently,deep Q network is incorporated as the central component of the diagnostic framework.The fault classification interactive environment is meticulously designed,and the agent training network is optimized.Through extensive interaction between the agent and the environment using fault data,satisfactory cumulative rewards and effective fault classification strategies are achieved.Experimental results demonstrate the proposed method’s high efficacy,with a training accuracy of 98.9%and a commendable test accuracy of 98.41%.Notably,the utilization of DRL in addressing the fault diagnosis challenge for railway point machines enhances the intelligence of diagnostic process,particularly through its excellent independent exploration capability.
基金supported by Fundamental Research Funds Program for the Central Universities(No.2019MS014)Key-Area Research and Development Program of Guangdong Province(No.2020B010166004).
文摘With the integration of alternative energy and renewables,the issue of stability and resilience of the power network has received considerable attention.The basic necessity for fault diagnosis and isolation is fault identification and location.The conventional intelligent fault identification method needs supervision,manual labelling of characteristics,and requires large amounts of labelled data.To enhance the ability of intelligent methods and get rid of the dependence on a large amount of labelled data,a novel fault identification method based on deep reinforcement learning(DRL),which has not received enough attention in the field of fault identification,is investigated in this paper.The proposed method uses different faults as parameters of the model to expand the scope of fault identification.In addition,the DRL algorithm can intelligently modify the fault parameters according to the observations obtained from the power network environment,rather than requiring manual and mechanical tuning of parameters.The methodology was tested on the IEEE 14 bus for several scenarios and the performance of the proposed method was compared with that of population-based optimization methods and supervised learning methods.The obtained results have confirmed the feasibility and effectiveness of the proposed method.
基金supported by the National Research Foundation(NRF),Singapore,and the Civil Aviation Authority of Singapore(CAAS),under the Aviation Transformation Programme(ATP).
文摘Unmanned aerial vehicles(UAVs)have gained much attention from academic and industrial areas due to the significant number of potential applications in urban airspace.A traffic management system for these UAVs is needed to manage this future traffic.Tactical conflict resolution for unmanned aerial systems(UASs)is an essential piece of the puzzle for the future UAS Traffic Management(UTM),especially in very low-level(VLL)urban airspace.Unlike conflict resolution in higher altitude airspace,the dense high-rise buildings are an essential source of potential conflict to be considered in VLL urban airspace.In this paper,we propose an attention-based deep reinforcement learning approach to solve the tactical conflict resolution problem.Specifically,we formulate this task as a sequential decision-making problem using Markov Decision Process(MDP).The double deep Q network(DDQN)framework is used as a learning framework for the host drone to learn to output conflict-free maneuvers at each time step.We use the attention mechanism to model the individual neighbor's effect on the host drone,endowing the learned conflict resolution policy to be adapted to an arbitrary number of neighboring drones.Lastly,we build a simulation environment with various scenarios covering different types of encounters to evaluate the proposed approach.The simulation results demonstrate that our proposed algorithm provides a reliable solution to minimize secondary conflict counts compared to learning and non-learning-based approaches under different traffic density scenarios.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC1900804 and 2019YFC1906002)。
文摘With the appearance of a huge number of reusable electronic products,the precise value evaluation has become an urgent problem to be solved in the recycling process.Traditional methods rely on manual intervention mostly.In order to make the model more suitable for the dynamic updating,this paper proposes the reinforcement learning based electronic products value prediction model which integrates market information to achieve timely and stable prediction results.The basic attributes and depreciation attributes of the product are modeled by two parallel neural networks separately to learn the different effects for prediction.Most importantly,the double deep Q network is adopted to fuse market information by reinforcement learning strategy,and the training on the old product data can be used to predict the following appeared product,which alleviates the cold start problem.Experiments on the real mobile phone recycling platform data verify that the model has achieved higher accuracy and it has a better generalization ability.