期刊文献+

基于改进人类反馈强化学习的端到端自动驾驶决策模型

End-To-End Autonomous Driving Decision Model Based on Improved Human Feedback Reinforcement Learning
在线阅读 下载PDF
导出
摘要 端到端自动驾驶是智能汽车领域的最新研究热点,现有研究大多采用人工设计的强化学习奖励函数,在复杂驾驶环境中存在学习效率和泛化能力提升的瓶颈。针对该问题,本文提出了一种基于改进人类反馈强化学习的端到端自动驾驶建模方法。首先,构建了采用响应比估计方法简化人类偏好的自动化奖励反馈,提高了驾驶策略的逻辑性、降低了人工设计成本;其次,设计了奖励函数预训练优化方法,通过学习初期对先验知识的嵌入加速了模型收敛过程;最后,提出了基于扩散模型的全新数据增强技术,建立了动态增强的奖励替代机制,解决了奖励函数的过拟合问题和切换平顺性问题,提升了复杂场景下强化学习智能体的适应性和鲁棒性。基于CARLA模拟器对所提方法进行验证,在最常用的LeaderBoard基准上,取得了87±2的驾驶得分,相比现有方法,本文所提模型具有更好的泛化性与学习效率。 End-to-end autonomous driving is the latest research hotspot in the field of intelligent vehicles.Most existing methods use manually designed reinforcement learning(RL)reward functions,which suffer from limitation in learning efficiency and generalization in complex driving environment.To address these issues,in this paper an end-to-end autonomous driving model based on improved human feedback reinforcement learning is proposed.Firstly,an automated reward feedback system that simplifies human preferences using response ratio estimation method is constructed,improving the logical consistency of driving strategies while reducing manual design cost.Secondly,an optimization method for pre-training the reward function accelerates convergence by embedding prior knowledge early in learning.Lastly,a novel data augmentation technique based on diffusion models is proposed and a dynamic-enhanced reward replacement mechanism is established,addressing reward overfitting and transition smoothness issues,thus improving agent adaptability and robustness in complex scenarios.The method is validated using the CARLA simulator,achieving a driving score of 87±2 on the LeaderBoard benchmark,demonstrating better generalization and learning efficiency compared to existing approaches.
作者 曹吴鸿 蔡英凤 刘泽 刘擎超 王海 陈龙 张晓东 Cao Wuhong;Cai Yingfeng;Liu Ze;Liu Qingchao;Wang Hai;Chen Long;Zhang Xiaodong(Institute of Automotive Engineering,Jiangsu University,Zhenjiang 212013;School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013;Geely Automotive Research Institute(Ningbo)Co.,Ltd.,Ningbo 315000)
出处 《汽车工程》 北大核心 2026年第1期24-36,共13页 Automotive Engineering
基金 国家重点研发计划项目(2022YFB2503302) 国家自然科学基金(52225212,52272418,U22A20100)资助。
关键词 端到端自动驾驶 人类反馈强化学习 响应比 扩散模型 end-to-end autonomous driving human feedback reinforcement learning response
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部