Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different ...Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.展开更多
针对目前雷达干扰抑制决策智能化程度低的问题,提出了一种基于双深度优先经验回放和可变贪婪算法改进的双重竞争深度Q网络(double dueling deep Q network,D3QN)决策的雷达干扰抑制方法。首先对雷达目标回波和干扰混合信号进行特征提取...针对目前雷达干扰抑制决策智能化程度低的问题,提出了一种基于双深度优先经验回放和可变贪婪算法改进的双重竞争深度Q网络(double dueling deep Q network,D3QN)决策的雷达干扰抑制方法。首先对雷达目标回波和干扰混合信号进行特征提取;然后根据信号特征通过可变贪婪算法选择动作作用于干扰,并将动作前后的信号特征存储于双深度优先经验回放池后,经过学习决策出最优的干扰抑制策略;最后使用该策略抑制干扰后输出。实验结果表明,该方法有效改善了信号的脉压结果,显著提升了信号的信干噪比,相较于基于D3QN的传统干扰抑制方法,在策略准确率和收敛速度上分别提升了7.3%和8.7%。展开更多
As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,t...As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios.展开更多
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ...At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.展开更多
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision...Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.展开更多
文摘Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.
文摘针对目前雷达干扰抑制决策智能化程度低的问题,提出了一种基于双深度优先经验回放和可变贪婪算法改进的双重竞争深度Q网络(double dueling deep Q network,D3QN)决策的雷达干扰抑制方法。首先对雷达目标回波和干扰混合信号进行特征提取;然后根据信号特征通过可变贪婪算法选择动作作用于干扰,并将动作前后的信号特征存储于双深度优先经验回放池后,经过学习决策出最优的干扰抑制策略;最后使用该策略抑制干扰后输出。实验结果表明,该方法有效改善了信号的脉压结果,显著提升了信号的信干噪比,相较于基于D3QN的传统干扰抑制方法,在策略准确率和收敛速度上分别提升了7.3%和8.7%。
基金supported by the National Key Research and Development Program of China(No.2022YFB4300902).
文摘As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios.
文摘At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.
基金funded by the Beijing Engineering Research Center of Electric Rail Transportation.
文摘Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.