Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse rewards.During exploration,the agent tries to discover unexplor...Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse rewards.During exploration,the agent tries to discover unexplored(novel)areas or high reward(quality)areas.Most existing methods perform exploration by only utilizing the novelty of states.The novelty and quality in the neighboring area of the current state have not been well utilized to simultaneously guide the agent’s exploration.To address this problem,this paper proposes a novel RL framework,called clustered reinforcement learning(CRL),for efficient exploration in RL.CRL adopts clustering to divide the collected states into several clusters,based on which a bonus reward reflecting both novelty and quality in the neighboring area(cluster)of the current state is given to the agent.CRL leverages these bonus rewards to guide the agent to perform efficient exploration.Moreover,CRL can be combined with existing exploration strategies to improve their performance,as the bonus rewards employed by these existing exploration strategies solely capture the novelty of states.Experiments on four continuous control tasks and six hard-exploration Atari-2600 games show that our method can outperform other state-of-the-art methods to achieve the best performance.展开更多
基金supported by the National Natural Science Foundation of China(Gtant No.62192783)Fundamental Research Funds for the Central Universities(No.020214380108).
文摘Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse rewards.During exploration,the agent tries to discover unexplored(novel)areas or high reward(quality)areas.Most existing methods perform exploration by only utilizing the novelty of states.The novelty and quality in the neighboring area of the current state have not been well utilized to simultaneously guide the agent’s exploration.To address this problem,this paper proposes a novel RL framework,called clustered reinforcement learning(CRL),for efficient exploration in RL.CRL adopts clustering to divide the collected states into several clusters,based on which a bonus reward reflecting both novelty and quality in the neighboring area(cluster)of the current state is given to the agent.CRL leverages these bonus rewards to guide the agent to perform efficient exploration.Moreover,CRL can be combined with existing exploration strategies to improve their performance,as the bonus rewards employed by these existing exploration strategies solely capture the novelty of states.Experiments on four continuous control tasks and six hard-exploration Atari-2600 games show that our method can outperform other state-of-the-art methods to achieve the best performance.