The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
In this paper,a dual deep Q-network(DDQN)energy management model based on long-short memory neural network(LSTM)speed prediction is proposed under the model predictive control(MPC)framework.The initial learning rate a...In this paper,a dual deep Q-network(DDQN)energy management model based on long-short memory neural network(LSTM)speed prediction is proposed under the model predictive control(MPC)framework.The initial learning rate and neuron dropout probability of the LSTM speed prediction model are optimized by the genetic algorithm(GA).The prediction results show that the root-mean-square error of the GA-LSTM speed prediction method is smaller than the SVR method in different speed prediction horizons.The predicted demand power,the state of charge(SOC),and the demand power at the current moment are used as the state input of the agent,and the real-time control of the control strategy is realized by the MPC method.The simulation results show that the proposed control strategy reduces the equivalent fuel consumption by 0.0354 kg compared with DDQN,0.8439 kg compared with ECMS,and 0.742 kg compared with the power-following control strategy.The difference between the proposed control strategy and the dynamic planning control strategy is only 0.0048 kg,0.193%,while the SOC of the power battery remains stable.Finally,the hardware-in-the-loop simulation verifies that the proposed control strategy has good real-time performance.展开更多
为调整不同路段的限速值,平滑交通流,从而提升高速公路车辆通行的安全性和效率,针对交通瓶颈区设计一种基于深度强化学习的平滑车速管控系统。该系统主要包含动态限速启动、限速值确定与更新和情报板动态发布等3个模块。将深度强化学习...为调整不同路段的限速值,平滑交通流,从而提升高速公路车辆通行的安全性和效率,针对交通瓶颈区设计一种基于深度强化学习的平滑车速管控系统。该系统主要包含动态限速启动、限速值确定与更新和情报板动态发布等3个模块。将深度强化学习算法DDQN(Double Deep Q-Network)引入系统中,提出一种基于DDQN的平滑车速控制策略,从目标网络和经验回顾2个维度提升该算法的性能。基于元胞传输模型(Cellular Transmission Model,CTM)对宁夏高速公路某路段的交通流运行场景进行仿真,以车辆总通行时间和车流量为评价指标验证该系统的有效性,结果表明该系统能提高瓶颈区内拥堵路段车辆的通行效率。展开更多
In order to solve the problems that the feature data type are not rich enough in the data collection process about the vehicle-following task in marine scene which results in a long model convergence time and high tra...In order to solve the problems that the feature data type are not rich enough in the data collection process about the vehicle-following task in marine scene which results in a long model convergence time and high training difficulty,a two-stage vehicle-following system was proposed.Firstly,semantic segmentation model predicts the number of pixels of the followed target,then the number of pixels of the followed target is mapped to the position feature.Secondly,deep reinforcement learning algorithm enables the control equipment to make decision action,to ensure that two moving objects remain within the safe distance.The experimental results show that the two-stage vehicle-following system has a 40%faster convergence rate than the model without position feature,and the following stability is significantly improved by adding the position feature.展开更多
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.
基金supported by the National Natural Science Foundation of China(No.52175236)Research and development and demonstration application of heavy-duty diesel vehicle exhaust emission testing technology,China(No.24-8-cspz-18-nsh)Qingdao Civi Science and Technology Plan,China(No.19-6-1-88-nsh).
文摘In this paper,a dual deep Q-network(DDQN)energy management model based on long-short memory neural network(LSTM)speed prediction is proposed under the model predictive control(MPC)framework.The initial learning rate and neuron dropout probability of the LSTM speed prediction model are optimized by the genetic algorithm(GA).The prediction results show that the root-mean-square error of the GA-LSTM speed prediction method is smaller than the SVR method in different speed prediction horizons.The predicted demand power,the state of charge(SOC),and the demand power at the current moment are used as the state input of the agent,and the real-time control of the control strategy is realized by the MPC method.The simulation results show that the proposed control strategy reduces the equivalent fuel consumption by 0.0354 kg compared with DDQN,0.8439 kg compared with ECMS,and 0.742 kg compared with the power-following control strategy.The difference between the proposed control strategy and the dynamic planning control strategy is only 0.0048 kg,0.193%,while the SOC of the power battery remains stable.Finally,the hardware-in-the-loop simulation verifies that the proposed control strategy has good real-time performance.
文摘为调整不同路段的限速值,平滑交通流,从而提升高速公路车辆通行的安全性和效率,针对交通瓶颈区设计一种基于深度强化学习的平滑车速管控系统。该系统主要包含动态限速启动、限速值确定与更新和情报板动态发布等3个模块。将深度强化学习算法DDQN(Double Deep Q-Network)引入系统中,提出一种基于DDQN的平滑车速控制策略,从目标网络和经验回顾2个维度提升该算法的性能。基于元胞传输模型(Cellular Transmission Model,CTM)对宁夏高速公路某路段的交通流运行场景进行仿真,以车辆总通行时间和车流量为评价指标验证该系统的有效性,结果表明该系统能提高瓶颈区内拥堵路段车辆的通行效率。
基金supported by the Key Research and Development Projects of Shaanxi Provincial Department(2017GY-055)。
文摘In order to solve the problems that the feature data type are not rich enough in the data collection process about the vehicle-following task in marine scene which results in a long model convergence time and high training difficulty,a two-stage vehicle-following system was proposed.Firstly,semantic segmentation model predicts the number of pixels of the followed target,then the number of pixels of the followed target is mapped to the position feature.Secondly,deep reinforcement learning algorithm enables the control equipment to make decision action,to ensure that two moving objects remain within the safe distance.The experimental results show that the two-stage vehicle-following system has a 40%faster convergence rate than the model without position feature,and the following stability is significantly improved by adding the position feature.