Controlling underestimation bias in reinforcement learning via minmax operation 被引量：1

导出

摘要 Obtaining the accurate value estimation and reducing the estimation bias are the key issues in reinforcement learning.However,current methods that address the overestimation problem tend to introduce underestimation,which face a challenge of precise decision-making in many fields.To address this issue,we conduct a theoretical analysis of the underestimation bias and propose the minmax operation,which allow for flexible control of the estimation bias.Specifically,we select the maximum value of each action from multiple parallel state-action networks to create a new state-action value sequence.Then,a minimum value is selected to obtain more accurate value estimations.Moreover,based on the minmax operation,we propose two novel algorithms by combining Deep Q-Network(DQN)and Double DQN(DDQN),named minmax-DQN and minmax-DDQN.Meanwhile,we conduct theoretical analyses of the estimation bias and variance caused by our proposed minmax operation,which show that this operation significantly improves both underestimation and overestimation biases and leads to the unbiased estimation.Furthermore,the variance is also reduced,which is helpful to improve the network training stability.Finally,we conduct numerous comparative experiments in various environments,which empirically demonstrate the superiority of our method.

作者 Fanghui HUANG Yixin HE Yu ZHANG Xinyang DENG Wen JIANG

机构地区 School of Electronics and Information College of Information Science and Engineering

出处《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第7期406-417,共12页 中国航空学报（英文版）

基金 supported by the National Natural Science Foundation of China(No.62173272).

关键词 Reinforcement learning Minmax operation Estimation bias Underestimation bias Variance

分类号 V279 [航空宇航科学与技术—飞行器设计] V249 [航空宇航科学与技术—飞行器设计]

引文网络
相关文献

参考文献6

1Qi ZHANG,Zongwu XIE,Baoshi CAO,Yang LIU.A policy iteration method for improving robot assembly trajectory efficiency[J].Chinese Journal of Aeronautics,2023,36(3):436-448. 被引量：2
2Songyan WANG,Fei LIU,Tao CHAO,Ming YANG.Robust spline-line energy management guidance algorithm with multiple constraints and uncertainties for solid rocket ascending[J].Chinese Journal of Aeronautics,2022,35(2):214-234. 被引量：3
3Yutong CHEN,Minghua HU,Yan XU,Lei YANG.Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks[J].Chinese Journal of Aeronautics,2023,36(4):338-353. 被引量：2
4Wenjun NI,Ying BI,Di WU,Xiaoping MA.Energy-optimal trajectory planning for solar-powered aircraft using soft actor-critic[J].Chinese Journal of Aeronautics,2022,35(10):337-353. 被引量：7
5Bodi MA,Zhenbao LIU,Feihong JIANG,Wen ZHAO,Qingqing DANG,Xiao WANG,Junhong ZHANG,Lina WANG.Reinforcement learning based UAV formation control in GPS-denied environment[J].Chinese Journal of Aeronautics,2023,36(11):281-296. 被引量：5
6Lei Xi,Le Zhang,Yanchun Xu,Shouxiang Wang,Chao Yang.Automatic Generation Control Based on Multiple-step Greedy Attribute and Multiple-level Allocation Strategy[J].CSEE Journal of Power and Energy Systems,2022,8(1):281-292. 被引量：3

二级参考文献20

1王继平,王明海,鲜勇.固体导弹耗尽关机与控制研究[J].弹箭与制导学报,2006,26(3):64-66. 被引量：10
2陈峰,王育林,肖业伦,陈万春.基于预测脱靶量的远程拦截速度增益导引[J].航空学报,2008,29(6):1665-1672. 被引量：6
3徐衡,陈万春.满足多约束的主动段能量管理制导方法[J].北京航空航天大学学报,2012,38(5):569-573. 被引量：11
4朱建文,刘鲁华,汤国建,徐明亮.低弹道多约束固体火箭能量管理方法研究[J].弹箭与制导学报,2012,32(5):121-123. 被引量：4
5姚党鼐,张力,王振国.姿态角单次调整的固体运载火箭耗尽关机能量管理方法[J].国防科技大学学报,2013,35(1):39-42. 被引量：7
6李新国,王晨曦,王文虎.基于修正Newton法的固体火箭能量管理弹道设计[J].固体火箭技术,2013,36(1):1-5. 被引量：13
7颜伟,赵瑞锋,赵霞,王聪,余娟.自动发电控制中控制策略的研究发展综述[J].电力系统保护与控制,2013,41(8):149-155. 被引量：57
8周军,潘彦鹏,呼卫军.固体火箭的鲁棒自适应耗尽关机制导方法研究[J].航天控制,2013,31(3):34-39. 被引量：15
9Boyi CHEN,Yanbin LIU,Haidong SHEN,Hao LEI,Yuping LU.Performance limitations in trajectory tracking control for air-breathing hypersonic vehicles[J].Chinese Journal of Aeronautics,2019,32(1):167-175. 被引量：9
10张志健,王小虎.固体火箭多约束耗尽关机的动态逆能量管理方法[J].固体火箭技术,2014,37(4):435-441. 被引量：7

共引文献16

1刘飞,王松艳,杨明,晁涛.大气层内固体火箭多约束鲁棒三维能量管理制导[J].宇航学报,2022,43(12):1652-1664. 被引量：1
2高显忠,邓小龙,王玉杰,郭正,侯中喜.临近空间太阳能飞机能量最优飞行航迹规划方法展望[J].航空学报,2023,44(8):1-22. 被引量：5
3王鹏,何涛,白金峰,冯小娟,寇少磊,吕明超,赵浩,邓一荣,范慧,甘黎明.基于正交试验的粒子群优化算法对火焰原子吸收光谱法分析金元素参数的优化[J].光谱学与光谱分析,2024,44(4):1045-1051. 被引量：3
4Xinglong ZHANG,Zhonglin LIN,Runmin JI,Tianhong ZHANG.Deep reinforcement learning based active surge control for aeroengine compressors[J].Chinese Journal of Aeronautics,2024,37(7):418-438.
5刘子荣,张晓辉,李焱宇,程奥祖,胡馨元.太阳能无人机深度强化学习航迹规划及半实物仿真[J].无人系统技术,2024,7(4):36-46. 被引量：2
6刘子荣,张晓辉,刘军虎,李焱宇,王同辉.太阳能无人机姿态/能量耦合实验教学平台设计与应用[J].实验技术与管理,2024,41(9):124-130. 被引量：1
7Qi YU,Wanchun CHEN,Wenbin YU.Approximate analytical solutions for threedimensional ascent trajectory of a solid-fuel launch vehicle with time-varying mass flow rate[J].Chinese Journal of Aeronautics,2024,37(10):275-293.
8CHEN Yunxiang,XU Yan,ZHAO Yifei.Stochastic Air Traffic Flow Management for Demand and Capacity Balancing Under Capacity Uncertainty[J].Transactions of Nanjing University of Aeronautics and Astronautics,2024,41(5):656-674.
9张弘达.基于CVPF算法的无人机编队系统设计[J].计算机测量与控制,2024,32(11):322-327. 被引量：1
10Wei Hu,Yu Dong,Lei Zhang,Yiting Wang,Yunchao Sun,Kexi Qian,Yuchen Qi.Research on complementarity of multi-energy power systems:A review[J].iEnergy,2023,2(4):275-283. 被引量：3

引证文献1

1Yameng Yin,Lieping Zhang,Xiaoxu Shi,Yilin Wang,Jiansheng Peng,Jianchu Zou.Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning[J].Computers, Materials & Continua,2024,81(11):2769-2790.

1Tana Wuyun,Lu Zhang,Tiina Tosens,Bin Liu,Kristiina Mark,JoséÁngel Morales-Sanchez,Jesamine Jöneva Rikisahedew,Vivian Kuusk,Ülo Niinemets.Extremely thin but very robust:Surprising cryptogam trait combinations at the end of the leaf economics spectrum[J].Plant Diversity,2024,46(5):621-629.
2Yiyu Chen,Dongyang Fu,Difeng Wang,Haoen Huang,Yang Si,Shangfeng Du.Noise‐tolerant matched filter scheme supplemented with neural dynamics algorithm for sea island extraction[J].CAAI Transactions on Intelligence Technology,2024,9(4):996-1013.

Chinese Journal of Aeronautics

2024年第7期

浏览历史

内容加载中请稍等...

Controlling underestimation bias in reinforcement learning via minmax operation 被引量：1

参考文献6

二级参考文献20

共引文献16

引证文献1

相关作者

相关机构

相关主题

浏览历史