期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Offline Generalized Actor-Critic With Distance Regularization
1
作者 huanting feng Yuhu Cheng Xuesong Wang 《IEEE/CAA Journal of Automatica Sinica》 2026年第1期57-71,共15页
In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target fo... In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target for OOD value function based on dataset distance and proposes a novel generalized Q-learning mechanism with distance regularization(GQDR).In theory,we not only prove the convergence of GQDR,but also ensure that the difference between the Q-value learned by GQDR and its true value is bounded.Furthermore,an offline generalized actor-critic method with distance regularization(OGACDR) is proposed by combining GQDR with actor-critic learning framework.Two implementations of OGACDR,OGACDR-EXP and OGACDRSQR,are introduced according to exponential(EXP) and opensquare(SQR) distance weight functions,and it has been theoretically proved that OGACDR provides a safe policy improvement.Experimental results on Gym-MuJoCo continuous control tasks show that OGACDR can not only alleviate the overestimation and overconservatism of Q-value function,but also outperform conservative offline RL baselines. 展开更多
关键词 Actor-critic distance regularization generalized Qlearning offline reinforcement learning out-of-distribution(OOD)
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部