期刊文献+

混合交通环境下基于动态决策间隔的强化学习信号控制方法

A Reinforcement Learning Signal Control Method Based on Dynamic Decision Intervals in Mixed Traffic Environments
在线阅读 下载PDF
导出
摘要 智能网联车辆(Connected and Automated Vehicle,CAV)为交通信号控制提供了新的数据源与优化机遇。然而,现有方法普遍存在两大局限:其一,多采用固定决策间隔,难以适应交通流的动态变化,导致控制策略的全局最优性不足;其二,缺乏对低渗透率场景下混合交通流复杂交互特征的深入建模,限制了实际应用的鲁棒性。为此,本文提出一种基于近端策略优化(Proximal Policy Optimization,PPO)的动态决策间隔信号控制方法。首先,利用卷积神经网络与多头注意力机制,构建融合CAV与常规车辆(Regular Vehicle,RV)的多源交通状态表征;进而,设计融合动态决策间隔与相位选择的多离散动作空间,自适应生成信号控制策略,平衡决策效率与控制灵活性。在奖励函数设计中,引入累计延误、排队长度与延误标准差的多目标自适应加权机制,协同优化通行效率与公平性。基于实际路网仿真测试模型控制效果,结果表明:在不同交通需求下,本文方法相较于传统离散控制方法,平均等待时间和平均排队长度均降低8.50%以上;尤其在CAV渗透率低至20%时,本文方法仍能保持稳定的控制性能,验证了其在混合交通环境中的有效性与强适应性。 Connected and Automated Vehicles(CAV)offer novel data sources and optimization opportunities for traffic signal control.However,the existing methods are generally limited in two aspects:first,most methods rely on fixed decision intervals,which struggle to adapt to the dynamic variations of traffic flow,leading to insufficient global optimality of control strategies;second,there is a lack of in-depth modeling of the complex interaction characteristics of mixed traffic flow in low-penetration scenarios,which restricts the robustness of practical applications.To address these issues,this paper proposes a dynamic decisioninterval signal control method based on Proximal Policy Optimization(PPO).The approach first constructs a multi-source traffic state representation that integrates information from both CAV and Regular Vehicle(RV)by employing Convolutional Neural Networks(CNN)and a multi-head attention mechanism.Subsequently,it designs a multi-discrete action space that combines dynamic decision intervals with phase selection to adaptively generate signal control strategies,thereby balancing decision efficiency and control flexibility.In the design of the reward function,a multi-objective adaptive weighting mechanism for cumulative delay,queue length,and delay standard deviation is introduced to co-optimize traffic efficiency and fairness.The simulation tests based on real-world road networks demonstrate the control effectiveness of the proposed model.The results indicate that under varying traffic demands,the proposed method reduces both the average waiting time and the average queue length by over 8.50%compared to the traditional discrete control methods.Notably,the method maintains stable control performance even when the CAV penetration rate is as low as 20%,validating its effectiveness and strong adaptability in mixed traffic environments.
作者 王福建 马佳豪 李廷浩 马东方 WANG Fujian;MAJiahao;LI Tinghao;MADongfang(Institute of Intelligent Transportation Systems,College of Civil Engineering and Architecture,Polytechnic Institute,Zhejiang University,Hangzhou 310058,China;Institute of Intelligent Transportation Systems,Polytechnic Institute,Zhejiang University,Hangzhou 310058,China;Institute of Ocean Sensing and Networking,Ocean College,Zhejiang University,Zhoushan 316021,Zhejiang,China)
出处 《交通运输系统工程与信息》 北大核心 2026年第1期45-54,共10页 Journal of Transportation Systems Engineering and Information Technology
基金 国家自然科学基金(52172334) 浙江省智能交通工程技术研究中心开放课题项目(2023ERCITZJ-KF09)。
关键词 智能交通 交通工程 深度强化学习 混合交通环境 动态决策间隔 交通信号控制 intelligent transportation traffic engineering deep reinforcement learning mixed traffic environment dynamic decision interval traffic signal control
  • 相关文献

参考文献8

二级参考文献32

共引文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部