基于改进人类反馈强化学习的端到端自动驾驶决策模型

End-To-End Autonomous Driving Decision Model Based on Improved Human Feedback Reinforcement Learning

下载PDF

导出

摘要端到端自动驾驶是智能汽车领域的最新研究热点,现有研究大多采用人工设计的强化学习奖励函数,在复杂驾驶环境中存在学习效率和泛化能力提升的瓶颈。针对该问题,本文提出了一种基于改进人类反馈强化学习的端到端自动驾驶建模方法。首先,构建了采用响应比估计方法简化人类偏好的自动化奖励反馈,提高了驾驶策略的逻辑性、降低了人工设计成本;其次,设计了奖励函数预训练优化方法,通过学习初期对先验知识的嵌入加速了模型收敛过程;最后,提出了基于扩散模型的全新数据增强技术,建立了动态增强的奖励替代机制,解决了奖励函数的过拟合问题和切换平顺性问题,提升了复杂场景下强化学习智能体的适应性和鲁棒性。基于CARLA模拟器对所提方法进行验证,在最常用的LeaderBoard基准上,取得了87±2的驾驶得分,相比现有方法,本文所提模型具有更好的泛化性与学习效率。 End-to-end autonomous driving is the latest research hotspot in the field of intelligent vehicles.Most existing methods use manually designed reinforcement learning(RL)reward functions,which suffer from limitation in learning efficiency and generalization in complex driving environment.To address these issues,in this paper an end-to-end autonomous driving model based on improved human feedback reinforcement learning is proposed.Firstly,an automated reward feedback system that simplifies human preferences using response ratio estimation method is constructed,improving the logical consistency of driving strategies while reducing manual design cost.Secondly,an optimization method for pre-training the reward function accelerates convergence by embedding prior knowledge early in learning.Lastly,a novel data augmentation technique based on diffusion models is proposed and a dynamic-enhanced reward replacement mechanism is established,addressing reward overfitting and transition smoothness issues,thus improving agent adaptability and robustness in complex scenarios.The method is validated using the CARLA simulator,achieving a driving score of 87±2 on the LeaderBoard benchmark,demonstrating better generalization and learning efficiency compared to existing approaches.

作者曹吴鸿蔡英凤刘泽刘擎超王海陈龙张晓东 Cao Wuhong;Cai Yingfeng;Liu Ze;Liu Qingchao;Wang Hai;Chen Long;Zhang Xiaodong(Institute of Automotive Engineering,Jiangsu University,Zhenjiang 212013;School of Automotive and Traffic Engineering,Jiangsu University,Zhenjiang 212013;Geely Automotive Research Institute(Ningbo)Co.,Ltd.,Ningbo 315000)

机构地区江苏大学汽车工程研究院江苏大学汽车与交通工程学院吉利汽车研究院(宁波)有限公司

出处《汽车工程》北大核心 2026年第1期24-36,共13页 Automotive Engineering

基金国家重点研发计划项目(2022YFB2503302) 国家自然科学基金(52225212,52272418,U22A20100)资助。

关键词端到端自动驾驶人类反馈强化学习响应比扩散模型 end-to-end autonomous driving human feedback reinforcement learning response

分类号 U463.6 [机械工程—车辆工程] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1李海青,雷宇铖,戴盈莹,禄盛,罗久飞.基于车路云协同感知的车辆驾驶意图识别方法[J].电子测量与仪器学报,2025,39(11):98-107.
2It must belong to Carla[J].时代英语(初中),2025(6):49-54.
3It must belong to Carla.[J].初中生辅导,2025(27):33-36.
4宫丽婷,胥川.考虑驾驶员注意力的行人安全评估[J].综合运输,2025,47(12):81-89.

汽车工程

2026年第1期

浏览历史

内容加载中请稍等...

基于改进人类反馈强化学习的端到端自动驾驶决策模型

相关作者

相关机构

相关主题

浏览历史