基于策略梯度强化学习的高铁列车动态调度方法
作者:
作者单位:

1.东北大学;2.中国铁道科学研究院集团有限公司通信信号研究所

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


A Policy Gradient Reinforcement Learning Algorithm for High-Speed Railway Dynamic Scheduling
Author:
Affiliation:

1.Northeastern University;2.Signal & Communication Research Institute, China Academy of Railway Sciences Corporation Limited

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    高速铁路以其运输能力大、速度快、全天候等优势,近些年来取得了飞速蓬勃的发展。而恶劣天气等突发事件会导致列车延误晚点,更甚者延误会沿着路网不断传播扩散,其带来的多米诺效应将造成大面积列车无法按计划运行图运行。目前依靠人工经验的动态调度方式难以满足快速优化调整的实际要求。因此,本文针对突发事件造成高铁列车延误晚点的动态调度问题,设定了所有列车在各站到发时间晚点总和最小为优化目标,构建了高铁列车可越行情况下的混合整数非线性规划模型,提出了基于策略梯度强化学习的高铁列车动态调度方法,包括交互环境建立、智能体状态及动作集合定义、策略网络结构及动作选择方法和回报函数建立,并结合具体问题对REINFORCE算法做了误差放大与阈值设定两种改进。最后对算法收敛性及算法改进后的性能提升进行了仿真研究,并同Q-Learning算法进行了比较,结果表明本文提出的方法可以有效的对高铁列车进行动态调度,将突发事件带来的延误影响降至最小,从而提高列车的运行效率。

    Abstract:

    The high-speed railway has achieved vigorous development in recent years due to its advantages of large transport capacity, fast speed, and all-weather. But unexpected events such as bad weather will cause train delays, and even the delay will continue to spread along the road network. The domino effect will cause large-area trains to fail to operate according to the plan. At present, the dynamic scheduling method relying on manual experience is difficult to meet the actual requirements. Therefore, this paper aims at the problem of dynamic scheduling of high-speed train, setting the minimum sum of the delays of all trains at each station as the optimization goal. At the same time, a Mixed-Integer Nonlinear Programming (MINLP) model under traversable conditions is constructed, and a policy gradient reinforcement learning method is proposed, including establishment of environment, definition of state and action set, policy network, action selection method, reward function and combined with the specific problems, the error amplification and threshold setting of algorithm REINFORCE are improved. Finally, the convergence and the performance improvement of the algorithm are studied, and compared with the Q-Learning algorithm. The results show that the method proposed in this paper can effectively reschedule high-speed trains, minimize the impact of delays, and improve the efficiency of train operation.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-04-20
  • 最后修改日期:2021-06-14
  • 录用日期:2021-06-17
  • 在线发布日期: 2021-07-01
  • 出版日期: