西南交通大学 信息科学与技术学院,成都 611756
School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China
为满足自适应巡航系统跟车模式下的舒适性需求并兼顾车辆安全性和行车效率,解决已有算法泛化性和舒适性差的问题,基于深度确定性策略梯度算法(deep deterministic policy gradient,DDPG),提出一种新的多目标车辆跟随决策算法.根据跟随车辆与领航车辆的相互纵向运动学特性,建立车辆跟随过程的马尔可夫决策过程(Markov decision process,MDP)模型.结合最小安全距离模型,设计一个高效、舒适、安全的车辆跟随决策算法.为提高模型收敛速度,改进了DDPG算法经验样本的存储方式和抽取策略,根据经验样本重要性的不同,对样本进行分类存储和抽取.针对跟车过程的多目标结构,对奖赏函数进行模块化设计.最后,在仿真环境下进行测试,当测试环境和训练环境不同时,依然能顺利完成跟随任务,且性能优于已有跟随算法.
To meet the comfort requirements of the adaptive cruise system following mode and take into account vehicle safety and driving efficiency, and solve the problem of poor generalization and comfort of existing algorithms, a new multi-target vehicle following decision is proposed based on the deep deterministic policy gradient(DDPG). According to the mutual longitudinal kinematics of the following vehicle and the pilot vehicle, a Markov decision process(MDP) model of the vehicle following process is established. Combined with the minimum safety distance model, an efficient, comfortable and safe vehicle following decision algorithm is designed. In order to improve the model convergence speed, the storage method and extraction strategy of the DDPG algorithm's experience samples are improved, and the samples are classified and stored according to the importance of the experience samples. Aiming at the multi-objective structure of the following process, the reward function is modularized. Finally, the test is performed in the simulation environment. When the test environment and the training environment are different, the following tasks can be successfully completed, and the performance is better than the existing following algorithms.