期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
SPaRM: an efficient exploration and planning framework for sparse reward reinforcement learning
1
作者 BAN Jian LI Gongyan XU Shaoyun 《High Technology Letters》 EI CAS 2024年第4期344-355,共12页
Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challeng... Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm. 展开更多
关键词 reinforcement learning(RL) sparse reward reward-free exploration(RFE) space partitioning(SP) reverse merging(RM)
在线阅读 下载PDF
A UAV collaborative defense scheme driven by DDPG algorithm 被引量:3
2
作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process. 展开更多
关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem
在线阅读 下载PDF
Hierarchical Reinforcement Learning With Automatic Sub-Goal Identification 被引量:1
3
作者 Chenghao Liu Fei Zhu +1 位作者 Quan Liu Yuchen Fu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第10期1686-1696,共11页
In reinforcement learning an agent may explore ineffectively when dealing with sparse reward tasks where finding a reward point is difficult.To solve the problem,we propose an algorithm called hierarchical deep reinfo... In reinforcement learning an agent may explore ineffectively when dealing with sparse reward tasks where finding a reward point is difficult.To solve the problem,we propose an algorithm called hierarchical deep reinforcement learning with automatic sub-goal identification via computer vision(HADS)which takes advantage of hierarchical reinforcement learning to alleviate the sparse reward problem and improve efficiency of exploration by utilizing a sub-goal mechanism.HADS uses a computer vision method to identify sub-goals automatically for hierarchical deep reinforcement learning.Due to the fact that not all sub-goal points are reachable,a mechanism is proposed to remove unreachable sub-goal points so as to further improve the performance of the algorithm.HADS involves contour recognition to identify sub-goals from the state image where some salient states in the state image may be recognized as sub-goals,while those that are not will be removed based on prior knowledge.Our experiments verified the effect of the algorithm. 展开更多
关键词 Hierarchical control hierarchical reinforcement learning OPTION sparse reward sub-goal
在线阅读 下载PDF
Prioritization Hindsight Experience Based on Spatial Position Attention for Robots
4
作者 Ye Yuan Yu Sha +3 位作者 Feixiang Sun Haofan Lu Shuiping Gou Jie Luo 《Machine Intelligence Research》 2025年第1期160-175,共16页
Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“sm... Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“small goals”within a hierarchical decision model.However,HER does not consider the value of different episodes for agent learning.In this paper,we propose SPAHER,a framework for prioritizing hindsight experiences based on spatial position attention.SPAHER allows the agent to prioritize more valuable experiences in a manipulation task.It achieves this by calculating transition and trajectory spatial position functions to determine the value of each episode for experience replays.We evaluate SPAHER on eight robot manipulation tasks in the Fetch and Hand environments provided by OpenAI Gym.Simulation results show that our method improves the final mean success rate by an average of 3.63%compared to HER,especially in challenging Hand environments.Notably,these improvements are achieved without any increase in computation time. 展开更多
关键词 Hindsight experience replay spatial position attention sparse reward deep reinforcement learning prioritization hindsight experience
原文传递
Fast-converging Deep Reinforcement Learning for Optimal Dispatch of Large-scale Power Systems Under Transient Security Constraints
5
作者 Tannan Xiao Ying Chen +2 位作者 Han Diao Shaowei Huang Chen Shen 《Journal of Modern Power Systems and Clean Energy》 2025年第5期1495-1506,共12页
Power system optimal dispatch with transient security constraints is commonly represented as transient securityconstrained optimal power flow(TSC-OPF).Deep reinforcement learning(DRL)-based TSC-OPF trains efficient de... Power system optimal dispatch with transient security constraints is commonly represented as transient securityconstrained optimal power flow(TSC-OPF).Deep reinforcement learning(DRL)-based TSC-OPF trains efficient decisionmaking agents that are adaptable to various scenarios and provide solution results quickly.However,due to the high dimensionality of the state space and action spaces,as well as the nonsmoothness of dynamic constraints,existing DRL-based TSCOPF solution methods face a significant challenge of the sparse reward problem.To address this issue,a fast-converging DRL method for optimal dispatch of large-scale power systems under transient security constraints is proposed in this paper.The Markov decision process(MDP)modeling of TSC-OPF is improved by reducing the observation space and smoothing the reward design,thus facilitating agent training.An improved deep deterministic policy gradient algorithm with curriculum learning,parallel exploration,and ensemble decision-making(DDPGCL-PE-ED)is introduced to drastically enhance the efficiency of agent training and the accuracy of decision-making.The effectiveness,efficiency,and accuracy of the proposed method are demonstrated through experiments in the IEEE 39-bus system and a practical 710-bus regional power grid.The source code of the proposed method is made public on GitHub. 展开更多
关键词 Large-scale power system optimal dispatch transient security optimal power flow reinforcement learning sparse reward
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部