期刊文献+

基于上下文深度元强化学习的高速铁路受电弓主动控制

Context-Based Deep Meta Reinforcement Learning for Active Pantograph Control in High-Speed Railway
在线阅读 下载PDF
导出
摘要 在高速铁路中,弓网接触力的波动严重影响高速列车的受流质量,导致接触部件磨损,严重增加运营成本,甚至影响运行安全。受电弓主动控制是提升受流质量的有效措施,然而,现有的控制方法受限于受电弓动作时延和缺乏精确数学模型,控制器性能不佳,且缺乏对新的弓网系统运行条件和环境扰动的快速适应能力。该文提出一种基于上下文的深度元强化学习算法来解决这些限制,使得智能体能够在新环境或新任务中快速适应。首先,提出改进的分布式SAC(soft actor-critic)算法,包括分布式状态-动作价值函数和双值分布学习,来解决价值估计中的高估问题和稳定训练过程。然后,提出一种环境变化敏感的任务编码器,能够根据交互样本的上下文快速生成最优任务编码。最后,在经过验证的受电弓-接触网系统基准和主动受电弓实验平台上对所提方法的有效性和鲁棒性进行了评估,与目前最先进的受电弓主动控制方法对比,所提方法具有最低的接触力方差,并且能够快速地适应新的控制任务和环境扰动。 Active pantograph control is the most promising technique for reducing contact force(CF)fluctuation and improving the train’s current collection quality.The train’s high-speed operation causes wave propagation and nonlinearity dynamics,making it challenging to maintain a suitable and stable contact force.Scholars have proposed numerous control strategies for the PCS in recent years,including proportional-integralderivative(PID)control,sliding mode control,feedback control,optimal control,robust control,etc.Thesecontrol strategies often achieve good results on single simulation scenes.Existing solutions,however,suffer from three significant limitations:(1)they are incapable of dealing with the various pantograph types,catenary line operating conditions,changing operating speeds,and contingencies well.(2)It is challenging to implement in practical systems due to the lack of rapid adaptability to new PCS operating conditions and environmental disturbances;(3)It is particularly difficult to characterize the sensor accuracy,actuator uncertainty,railway line parameters,and external excitations because all of these factors can drift over time.The high-speed railway systems with widely varying operating conditions will increase uncertainty.In this paper,we improve the current collection quality by developing and applying a context-based deep meta-reinforcement learning(CB-DMRL)approach to learn and finetune the control strategy.It combines improved distributed soft actor-critic algorithm with environment-sensitive task encoder to train a meta-policy,which can quickly adapt to different PCS operating conditions and environmental disturbances.Firstly,an improved distributed soft actor-critic algorithm,including distributed state-action value function and dual-value distribution learning,is proposed to solve the overestimation problem in value estimation and stabilize the training process.Secondly,the proposed method allows the environment-sensitive task encoder and well-trained agent to adapt to new tasks quickly and efficiently,even in unseen tasks and non-stationary environments.Finally,a validated non-linear pantograph-catenary system model is established based on the finite element and multi-body dynamics theory as the simulation environment in DRL.The state space,action space,and reward in are redesigned to train the meta-agent.We evaluated the CB-DMRL algorithm’s performance on a proven PCS model and active pantograph HIL experiment platform.The experimental results demonstrate that meta-training DRL policies with latent space swiftly adapt to new operating conditions and unknown perturbations.The meta-agent adapts quickly after two iterations with a high reward,which require only 10 steps.The standard deviation of contact force is reduced by 14.71%,16.93%,21.22%,and 35.69%at 320 km/h,340 km/h,360 km/h,and 380 km/h respectively.The faster the train runs,the better the control effect is.Because the high-speed train has caused strong pantograph-catenary system vibration,our method can effectively improve the current collection quality.Even in unknown scenarios,the proposed approach can adapt the well-trained behavior policy to the new task using environment-sensitive task encoder.Given the rapidly changing PCS operating conditions and unknown environmental disturbances,we believe this research is a significant advance in applying DRL to PCS control.The following conclusions can be drawn from the simulation analysis:(1)In the algorithm components of this paper,the improved distributed SAC algorithm can effectively improve the value estimation accuracy and stabilize the training process to solve the problem of exploding and vanishing gradients.The task encoder sensitive to environmental changes can quickly generate the optimal task encoding based on the context of the interaction sample.(2)Compared with the most advanced pantograph active control method,the method proposed in this paper achieves the lowest contact force variance and can quickly adapt to new control tasks and environmental disturbances.The deep meta-reinforcement learning algorithm used in this paper has achieved good results in multi-scenario pantograph active control.However,how to scientifically and effectively update the meta-strategy parameters in limited computing power scenarios such as running vehicles needs to be explored in the future.
作者 王惠 彭宇祥 储文平 宋洋 刘志刚 Wang Hui;Peng Yuxiang;Chu Wenping;Song Yang;Liu Zhigang(School of Electrical Engineering Southwest Jiaotong University,Chengdu 611756 China;China Railway Construction Electrification Bureau Group Co.Ltd,Beijing 100043 China;Leeds-SWJTU Joint School Southwest Jiaotong University,Chengdu 611756 China)
出处 《电工技术学报》 北大核心 2026年第5期1680-1695,共16页 Transactions of China Electrotechnical Society
关键词 高速铁路 受电弓主动控制 深度强化学习 元学习 High-speed railway active pantograph control deep reinforcement learning meta-learning
  • 相关文献

参考文献13

二级参考文献106

共引文献187

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部