基于上下文深度元强化学习的高速铁路受电弓主动控制

Context-Based Deep Meta Reinforcement Learning for Active Pantograph Control in High-Speed Railway

下载PDF

导出

摘要在高速铁路中,弓网接触力的波动严重影响高速列车的受流质量,导致接触部件磨损,严重增加运营成本,甚至影响运行安全。受电弓主动控制是提升受流质量的有效措施,然而,现有的控制方法受限于受电弓动作时延和缺乏精确数学模型,控制器性能不佳,且缺乏对新的弓网系统运行条件和环境扰动的快速适应能力。该文提出一种基于上下文的深度元强化学习算法来解决这些限制,使得智能体能够在新环境或新任务中快速适应。首先,提出改进的分布式SAC(soft actor-critic)算法,包括分布式状态-动作价值函数和双值分布学习,来解决价值估计中的高估问题和稳定训练过程。然后,提出一种环境变化敏感的任务编码器,能够根据交互样本的上下文快速生成最优任务编码。最后,在经过验证的受电弓-接触网系统基准和主动受电弓实验平台上对所提方法的有效性和鲁棒性进行了评估,与目前最先进的受电弓主动控制方法对比,所提方法具有最低的接触力方差,并且能够快速地适应新的控制任务和环境扰动。 Active pantograph control is the most promising technique for reducing contact force(CF)fluctuation and improving the train’s current collection quality.The train’s high-speed operation causes wave propagation and nonlinearity dynamics,making it challenging to maintain a suitable and stable contact force.Scholars have proposed numerous control strategies for the PCS in recent years,including proportional-integralderivative(PID)control,sliding mode control,feedback control,optimal control,robust control,etc.Thesecontrol strategies often achieve good results on single simulation scenes.Existing solutions,however,suffer from three significant limitations:(1)they are incapable of dealing with the various pantograph types,catenary line operating conditions,changing operating speeds,and contingencies well.(2)It is challenging to implement in practical systems due to the lack of rapid adaptability to new PCS operating conditions and environmental disturbances;(3)It is particularly difficult to characterize the sensor accuracy,actuator uncertainty,railway line parameters,and external excitations because all of these factors can drift over time.The high-speed railway systems with widely varying operating conditions will increase uncertainty.In this paper,we improve the current collection quality by developing and applying a context-based deep meta-reinforcement learning(CB-DMRL)approach to learn and finetune the control strategy.It combines improved distributed soft actor-critic algorithm with environment-sensitive task encoder to train a meta-policy,which can quickly adapt to different PCS operating conditions and environmental disturbances.Firstly,an improved distributed soft actor-critic algorithm,including distributed state-action value function and dual-value distribution learning,is proposed to solve the overestimation problem in value estimation and stabilize the training process.Secondly,the proposed method allows the environment-sensitive task encoder and well-trained agent to adapt to new tasks quickly and efficiently,even in unseen tasks and non-stationary environments.Finally,a validated non-linear pantograph-catenary system model is established based on the finite element and multi-body dynamics theory as the simulation environment in DRL.The state space,action space,and reward in are redesigned to train the meta-agent.We evaluated the CB-DMRL algorithm’s performance on a proven PCS model and active pantograph HIL experiment platform.The experimental results demonstrate that meta-training DRL policies with latent space swiftly adapt to new operating conditions and unknown perturbations.The meta-agent adapts quickly after two iterations with a high reward,which require only 10 steps.The standard deviation of contact force is reduced by 14.71%,16.93%,21.22%,and 35.69%at 320 km/h,340 km/h,360 km/h,and 380 km/h respectively.The faster the train runs,the better the control effect is.Because the high-speed train has caused strong pantograph-catenary system vibration,our method can effectively improve the current collection quality.Even in unknown scenarios,the proposed approach can adapt the well-trained behavior policy to the new task using environment-sensitive task encoder.Given the rapidly changing PCS operating conditions and unknown environmental disturbances,we believe this research is a significant advance in applying DRL to PCS control.The following conclusions can be drawn from the simulation analysis:(1)In the algorithm components of this paper,the improved distributed SAC algorithm can effectively improve the value estimation accuracy and stabilize the training process to solve the problem of exploding and vanishing gradients.The task encoder sensitive to environmental changes can quickly generate the optimal task encoding based on the context of the interaction sample.(2)Compared with the most advanced pantograph active control method,the method proposed in this paper achieves the lowest contact force variance and can quickly adapt to new control tasks and environmental disturbances.The deep meta-reinforcement learning algorithm used in this paper has achieved good results in multi-scenario pantograph active control.However,how to scientifically and effectively update the meta-strategy parameters in limited computing power scenarios such as running vehicles needs to be explored in the future.

作者王惠彭宇祥储文平宋洋刘志刚 Wang Hui;Peng Yuxiang;Chu Wenping;Song Yang;Liu Zhigang(School of Electrical Engineering Southwest Jiaotong University,Chengdu 611756 China;China Railway Construction Electrification Bureau Group Co.Ltd,Beijing 100043 China;Leeds-SWJTU Joint School Southwest Jiaotong University,Chengdu 611756 China)

机构地区西南交通大学电气工程学院中国铁建电气化局集团有限公司西南交通大学利兹学院

出处《电工技术学报》北大核心 2026年第5期1680-1695,共16页 Transactions of China Electrotechnical Society

关键词高速铁路受电弓主动控制深度强化学习元学习 High-speed railway active pantograph control deep reinforcement learning meta-learning

分类号 TM614 [电气工程—电力系统及自动化]

引文网络
相关文献

参考文献13

1赵娜,韦晓广,高仕斌.基于可靠性和维修成本的高铁接触网维修策略优化[J].电气化铁道,2024,35(6):1-6. 被引量：2
2谢松霖,张静,宋宝林,刘志刚,高仕斌.计及作动器时滞的高速铁路受电弓最优控制[J].电工技术学报,2022,37(2):505-514. 被引量：4
3吴延波,韩志伟,王惠,刘志刚,张雨婧.基于双延迟深度确定性策略梯度的受电弓主动控制[J].电工技术学报,2024,39(14):4547-4556. 被引量：4
4张静,宋宝林,谢松霖,张翰涛,刘志刚.基于状态估计的高速受电弓鲁棒预测控制[J].电工技术学报,2021,36(5):1075-1083. 被引量：12
5杨鹏,张静,金伟,刘志刚.考虑气动系统的高速受电弓分层控制[J].电工技术学报,2022,37(10):2644-2655. 被引量：2
6刘建伟,高峰,罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报,2019,42(6):1406-1438. 被引量：158
7陈荣亮,梁海燕,刘艺涛.基于人工神经网络的差模EMI滤波器插入损耗预测[J].电源学报,2024,22(5):67-73. 被引量：5
8王殿元,赵兴东,豆飞,周旭.基于深度Q网络的城市轨道交通协同限流方法[J].都市快轨交通,2024,37(3):97-102. 被引量：4
9张华强,牟晨东,赵玫,姚统.基于强化学习的多光储虚拟同步机频率协调控制策略[J].电气传动,2021,51(19):36-42. 被引量：8
10刘阳,姚宇昊,裴川东,彭忆强,黄晓蓉.基于模糊融合的锂电池深度强化学习充电策略[J].电力电子技术,2025,59(12):94-102. 被引量：1

二级参考文献106

1张辉,郭建媛(指导),豆飞,唐雨昕,杜佳敏.基于深度Q网络的轨道交通客流控制[J].都市快轨交通,2022,35(3):60-64. 被引量：4
2陈绍宽,毛保华,何天健,刘剑锋,刘海东.基于事故树分析的铁路牵引供电系统可靠性评价[J].铁道学报,2006,28(6):123-129. 被引量：37
3杨向真,苏建徽,丁明,杜燕.微电网孤岛运行时的频率控制策略[J].电网技术,2010,34(1):164-168. 被引量：157
4Weihua ZHANG Ning ZHOU Ruiping LI Guiming MEI Dongli SONG.Pantograph and catenary system with double pantographs for high-speed trains at 350 km/h or higher[J].Journal of Modern Transportation,2011,19(1):7-11. 被引量：12
5韩志伟,刘志刚,张桂南,杨红梅.非接触式弓网图像检测技术研究综述[J].铁道学报,2013,35(6):40-47. 被引量：82
6张正,蒋熙,贺英松.城市轨道交通高峰时段车站协同限流安全控制研究[J].中国安全生产科学技术,2013,9(10):5-9. 被引量：28
7陈绍宽,王秀丹,柏赟,刘海东,毛保华.基于费用最小的铁路牵引接触网维修计划优化模型[J].铁道学报,2013,35(12):37-42. 被引量：23
8赵鹏,姚向明,禹丹丹.高峰时段城市轨道交通线路客流协调控制[J].同济大学学报（自然科学版）,2014,42(9):1340-1346. 被引量：39
9张静,刘志刚,鲁小兵,宋洋.高速弓网空气动力学研究进展[J].铁道学报,2015,37(1):7-15. 被引量：25
10鲁小兵,刘志刚.高速铁路受电弓主动控制算法适用性研究[J].西南交通大学学报,2015,50(2):233-240. 被引量：13

共引文献187

1刘朝阳,穆朝絮,孙长银.深度强化学习算法与应用研究现状综述[J].智能科学与技术学报,2020,2(4):314-326. 被引量：64
2张磊,母亚双,潘泉.基于改进深度双Q网络的移动机器人路径规划算法[J].信息与控制,2024,53(3):365-376. 被引量：6
3李斌,娄璟,杜典松.基于SOA-SVM的弓网电弧识别方法[J].电子测量与仪器学报,2022,36(10):83-91. 被引量：4
4李瑜,张占强,孟克其劳,魏皓天.基于改进深度确定性策略梯度算法的微电网能量优化调度[J].电子测量技术,2023,46(2):73-80. 被引量：10
5马庆刘,喻鹏,吴佳慧,熊翱,颜拥.基于深度强化学习的综合能源业务通道优化机制[J].北京邮电大学学报,2020,43(2):87-93. 被引量：1
6闫冬,陈盛,彭国政,谈元鹏,张玉天,吴凯.基于层次深度强化学习的带电作业机械臂控制技术[J].高电压技术,2020,46(2):459-471. 被引量：21
7刘俊红.3中药汤剂对胃溃疡治疗机理的探讨[J].河南中医,2000,20(3):28-28. 被引量：3
8汪岿,刘柏嵩.文本分类研究综述[J].数据通信,2019,0(3):37-47. 被引量：22
9罗颖,秦文虎,翟金凤.基于改进DDPG算法的车辆低速跟驰行为决策研究[J].测控技术,2019,38(9):19-23. 被引量：4
10朱小琴,袁晖,王维洲,魏峰,张驯,赵金雄.基于深度强化学习的电力通信网路由策略[J].科学技术创新,2019(36):91-93. 被引量：6

1段伟.基于深度强化学习与数字孪生的电网多模态信息协同保护决策系统研究[J].中文科技期刊数据库(全文版)工程技术,2025(7):106-109.
2宋少磊,赵庶旭.边缘计算中基于系统型MDS码的编码方案[J].现代信息科技,2025,9(21):1-8.
3胡祥杰,于素芬,王续凡,关金发,宋洋.时速400公里高速铁路弓网系统仿真技术研究[J].电气化铁道,2025,36(4):22-26.
4吴素我,洪婉莎.5G对继电保护选择性和准确性的影响及改进措施[J].通信电源技术,2025,42(18):164-166.
5彭宇祥,韩志伟,王惠,洪玮佳,刘志刚.基于强化学习指导模型预测控制算法的高速列车受电弓主动控制策略研究[J].电子与信息学报,2025,47(10):3869-3881. 被引量：1
6王继业.大型农田灌溉区水闸排水恒流自动化控制方法[J].自动化应用,2025,66(7):282-284.
7徐乐.基于物联网的智能配电网继电保护系统设计[J].通信电源技术,2025,42(19):38-40.
8连江,班文哲,焦平,连宏伟.继电保护技术在现代电气自动化系统中的集成与优化[J].自动化应用,2025,66(24):123-125.
9冯晓航,陈光雄,梅桂明,董丙杰,赵鹏鹏,李先航.地铁弓网系统摩擦自激振动研究[J].西南交通大学学报,2025,60(2):418-424. 被引量：1
10宋庆松,刘伟涛,左晓珺,罗玉玲,续沛然,张宇,孙小滢,陈曦,罗程.基于功能磁共振成像的民航飞行员前瞻性记忆激活脑网络研究[J].中华航空航天医学杂志,2025,36(2):93-100.

电工技术学报

2026年第5期

浏览历史

内容加载中请稍等...

基于上下文深度元强化学习的高速铁路受电弓主动控制

参考文献13

二级参考文献106

共引文献187

相关作者

相关机构

相关主题

浏览历史