基于深度强化学习的自动驾驶车辆跟驰行为建模

陈越; 焦朋朋; 白如玉; 李汝鉴

doi:10.3963/j.jssn.1674-4861.2023.02.007

基于深度强化学习的自动驾驶车辆跟驰行为建模

doi: 10.3963/j.jssn.1674-4861.2023.02.007

北京建筑大学通用航空技术北京实验室北京 100044

基金项目:

国家自然科学基金项目 52172301

国家社科基金项目 21ZAD029

北京市社会科学基金项目 21GLA010

详细信息

作者简介:
陈越（1996—），硕士研究生. 研究方向：智能交通、自动驾驶. E-mail：chenyue_bucea@163.com

通讯作者:
焦朋朋（1980—），博士，教授. 研究方向：智能交通、交通管理、交通规划与管理、交通安全等.E-mail：jiaopengpeng@bucea.edu.cn

中图分类号: U491.2+5
计量
- 文章访问数: 1012
- HTML全文浏览量: 341
- PDF下载量: 109
- 被引次数: 0
出版历程
- 收稿日期: 2022-09-14
- 网络出版日期: 2023-06-19

Modeling Car Following Behavior of Autonomous Driving Vehicles Based on Deep Reinforcement Learning

Beijing Key Laboratory of General Aviation Technology, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

摘要

摘要: 为提高自动驾驶车辆的跟驰性能，减轻交通震荡干扰的负面影响，研究了1种基于深度强化学习的自动驾驶跟驰模型。在现有奖励函数设计基础上融入对能源消耗的考虑，基于VT-Micro模型构建能耗相关项；同时对使用跟车时距构建行驶效率因素相关项的方法进行优化，添加虚拟速度来避免在交通震荡场景中出现计算溢出和车间距过近的问题。为克服过往抑制震荡研究中仅用闭合环状模拟道路和仿真车辆轨迹开展训练的局限性，选用NGSIM轨迹数据中交通震荡阶段的驾驶员行为特征搭建训练环境，应用双延迟深度确定性策略梯度算法（Twin Delayed Deep Deterministic Policy Gradient Algorithm，TD3）训练形成多目标优化的跟驰模型。进一步构建模型性能测试评价体系，对比分析TD3模型与其他传统模型在跟车与交通震荡2类测试场景中的表现。跟车测试场景实验结果表明：在舒适度与行驶效率上，TD3模型和传统自适应巡航控制（Adaptive Cruise Control, ACC）模型表现相近，二者均优于人类驾驶员；在安全性上，TD3模型相较于传统ACC模型安全隐患降低53.65%，相较于人类驾驶员降低36.24%；在能源消耗上，TD3模型相较于传统ACC模型和人类驾驶员分别降低6.73%和15.65%。交通震荡场景实验结果表明：TD3模型可以有效减少交通振荡的负面影响；当TD3模型渗透率为100%时，相较于纯人类驾驶环境，行驶过程中的不适性降低55.95%，行驶效率提高8.82%，安全隐患降低73.21%，油耗减少5.97%。
- 智能交通 /
- 自动驾驶 /
- 强化学习 /
- 跟驰模型 /
- 交通震荡
Abstract: In order to enhance the performance of car following behavior of autonomous vehicles and mitigate the negative effects of traffic oscillations, a deep reinforcement learning-based car following model for automated driving is investigated. The existing reward function is improved by incorporating energy consumption, and the related terms for representing energy consumption are established based on the VT-Micro model. In addition, the method of using the time gap between vehicles to establish the reward function related to driving efficiency is improved by adding virtual speed to the time gap, in order to avoid computation overflow and unrealistic short following distance in the traffic oscillation scenario. To overcome the limitations of training on closed-loop simulated roads and simulated vehicle trajectories, human driver behavior extracted from the NGSIM trajectory data during traffic oscillation are used to develop the training environment. By applying the twin delayed deep deterministic policy gradient algorithm (TD3), a multi-objective car following model is then developed. A system for evaluating model performance is established to compare the performance of the TD3 model with traditional models in car following and traffic oscillations scenarios. Study results of car following scenarios show that the TD3 model and the traditional adaptive cruise control (ACC) model perform similarly in terms of comfort and driving efficiency, but both outperform the human drivers. In terms of safety, the TD3 model reduces safety hazards by 53.65% compared to the traditional ACC model, and 36.24% compared to the human drivers. Regarding energy consumption, the TD3 model reduces the energy consumption of the conventional ACC model and human drivers by 6.73% and 15.65%, respectively. Study results show that the TD3 model can reduce the negative impacts of traffic oscillations. In the scenario with a 100% TD3 model penetration rate, driving discomfort decreases by 55.95%, driving efficiency increases by 8.82%, crash risks reduce by 73.21%, and fuel consumption drops by 5.97%, compared to a 100% human-driven environment.
- intelligent transportation /
- autonomous vehicle /
- reinforcement learning /
- car following model /
- traffic oscillation

HTML全文

图 1 TD3模型训练过程

Figure 1. TD3 model training process

下载: 全尺寸图片幻灯片

图 2 滑动平均奖励值变化

Figure 2. Changing of rolling mean episode reward

下载: 全尺寸图片幻灯片

图 3 TTC数据概率密度

Figure 3. TTC probability density function

下载: 全尺寸图片幻灯片

图 4 iTTC数据概率密度

Figure 4. iTTC probability density function

下载: 全尺寸图片幻灯片

图 5 能耗数据概率密度

Figure 5. Energy consumption probability density function

下载: 全尺寸图片幻灯片

图 6 车辆加速度变化

Figure 6. Changing of vehicle acceleration

下载: 全尺寸图片幻灯片

图 7 车辆速度变化

Figure 7. Changing of vehicle speed

下载: 全尺寸图片幻灯片

图 8 跟车间距变化

Figure 8. Changing of car following distance

下载: 全尺寸图片幻灯片

图 9 跟车时距概率密度

Figure 9. Time gap probability density function

下载: 全尺寸图片幻灯片

图 10 Jerk数据概率密度

Figure 10. Jerk probability density function

下载: 全尺寸图片幻灯片

图 11 跟驰速度对比

Figure 11. Comparison of vehicle speed

下载: 全尺寸图片幻灯片

图 12 跟驰加速度对比

Figure 12. Comparison of vehicle acceleration

下载: 全尺寸图片幻灯片

图 13 不同TD3模型渗透率车辆轨迹对比图

Figure 13. Comparison of vehicle trajectory in various TD3 vehicle penetration rate

下载: 全尺寸图片幻灯片

表 1 模型超参数

Table 1. Hyperparameters of model

参数	取值
Actor网络学习率	0.000 1
Critic网络学习率	0.000 2
批量大小	512
经验池大小	50 000
折扣系数	0.95
软更新速率	0.01
Actor网络延迟更新频率	2
α₀	5
α₁	-120
α₂	0.05
α₃	0.4
α₄	0.1
α₅	-1.2
α₆	1
α₇	-0.3
t₀	0.5

下载: 导出CSV

表 2 安全性与燃油消耗对比

Table 2. Comparison of safety and fuel consumption

渗透率/%	平均 iTTC值/s	相对变化率/%	平均燃油消耗/mL	相对变化率/%
0	32.22	0	247.49	0
20	26.21	-18.65	246.18	-0.52
40	22.10	-31.41	243.68	-1.54
60	16.37	-49.19	238.77	-3.52
80	10.12	-68.59	233.18	-5.78
100	8.63	-73.21	232.71	-5.97

下载: 导出CSV

表 3 行驶效率与舒适度对比

Table 3. Comparison of traffic efficiency and comfort

渗透率/%	100~200 s 时平均速度/(m/s)	相对变化率/%	平均Jerk 绝对值之和/(m/s³)	相对变化率/%
0	7.59	0	51.81	0
20	7.71	1.58	45.57	-12.04
40	7.82	3.03	39.68	-23.41
60	8.06	6.19	33.35	-35.63
80	8.21	8.16	24.76	-52.21
100	8.26	8.82	22.82	-55.95

下载: 导出CSV

参考文献(23)

[1]	LI X, CUI J, SHI A, et al. Stop-and-go traffic analysis: theoretical properties, environmental impacts and oscillation mitigation[J]. Transportation Research Part B: Methodological, 2014(70): 319-339.
[2]	ZHENG Z, AHN S, MONSERE C M. Impact of traffic oscillations on freeway crash occurrences[J]. Accident Analysis & Prevention, 2010, 42(2): 626-636.
[3]	GOLOB T F, RECKER W W, ALVAREZ V M. Safety aspects of freeway weaving sections[J]. Transportation Research Part A: Policy & Practice, 2004, 38(1): 35-51.
[4]	韩雨, 郭延永, 张乐, 等. 消除高速公路运动波的可变限速控制方法[J]. 中国公路学报, 2022, 35(1): 151-158. doi: 10.19721/j.cnki.1001-7372.2022.01.013 HAN Y, GUO Y Y, ZHANG L, et al. An optimal variable speed limit control approach against freeway jam waves[J]. China Journal of Highway and Transport, 2022, 35(1): 151-158. (in Chinese) doi: 10.19721/j.cnki.1001-7372.2022.01.013
[5]	HE Z, LIANG Z, SONG L, et al. A jam-absorption driving strategy for mitigating traffic oscillations[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(4): 802-813. doi: 10.1109/TITS.2016.2587699
[6]	秦严严, 王昊, 何兆益, 等. 基于比功率的自动驾驶交通流油耗分析[J]. 交通运输系统工程与信息, 2020, 20(1): 91-96. doi: 10.16097/j.cnki.1009-6744.2020.01.014 QIN Y Y, WANG H, HE Z Y, et al. Fuel consumption analysis of automated driving traffic flow based on vehicle specific power[J]. Journal of Transportation Systems Engineering and Information Technology, 2020, 20(1): 91-96. (in Chinese) doi: 10.16097/j.cnki.1009-6744.2020.01.014
[7]	KESTING A, TREIBER M, SCHÖNHOF M, et al. Adaptive cruise control design for active congestion avoidance[J]. Transportation Research Part C: Emerging Technologies, 2008.16(6): 668-683. doi: 10.1016/j.trc.2007.12.004
[8]	LI T N, CHEN D J, ZHAO H, et al. Car-following behavior characteristics of adaptive cruise control vehicles based on empirical experiments[J]. Transportation Research Part B: Methodological, 2021.147: 67-91. doi: 10.1016/j.trb.2021.03.003
[9]	LIN X, MENG W, VAN AREM B. Realistic car-following models for microscopic simulation of adaptive and cooperative adaptive cruise control vehicles[J]. Transportation Research Record: Journal of the Transportation Research Board, 2017, 2623(1): 1-9. doi: 10.3141/2623-01
[10]	ZHOU M, QU X, LI X. A recurrent neural network based microscopic car following model to predict traffic oscillation[J]. Transportation Research Part C: Emerging Technologies, 2017, 84: 245-264. doi: 10.1016/j.trc.2017.08.027
[11]	HUANG X, SUN J, SUN J. A car-following model considering asymmetric driving behavior based on long short-term memory neural networks[J]. Transportation Research Part C: Emerging Technologies, 2018, 95: 346-362. doi: 10.1016/j.trc.2018.07.022
[12]	MA L, QU S. A sequence to sequence learning based car-following model for multi-step predictions considering reaction delay[J]. Transportation Research Part C: Emerging Technologies, 2020, 120: 102785. doi: 10.1016/j.trc.2020.102785
[13]	朱冰, 蒋渊德, 赵健, 等. 基于深度强化学习的车辆跟驰控制[J]. 中国公路学报, 2019, 32(6): 53-60. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201906006.htm ZHU B, JIANG Y D, ZHAO J, et al. A car-following control algorithm based on deep reinforcement learning[J]. China Journal of Highway and Transport, 2019, 32(6): 53-60. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201906006.htm
[14]	闫浩, 刘小珠, 石英. 基于REINFORCE算法和神经网络的无人驾驶车辆变道控制[J]. 交通信息与安全, 2021, 39(1): 164-172. doi: 10.3963/j.jssn.1674-4861.2021.01.0019 YAN H, LIU X Z, SHI Y. Lane-change control for unmanned vehicle based on REINFORCE algorithm and neural network[J]. Journal of Transport Information and Safety, 2021, 39(1): 164-172. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2021.01.0019
[15]	李孟凡, 秦文虎, 云中华. 基于横纵向联合控制的多目标优化车辆跟驰研究[J]. 计算机应用研究, 2022, 39(8): 2409-2413. https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202208028.htm LI M F, QIN W H, YUN Z H. Multi-objective optimal car-following model with lateral and longitudinal control[J]. ApplicationResearchofComputers, 2022, 39 (8): 2409-2413. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202208028.htm
[16]	KREIDIEH A R, WU C, BAYCN A M. Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning[C]. 2018 IEEE International Conference on Intelligent Transportation Systems(ITSC), Hawaii, USA: IEEE, 2018.
[17]	QU X, YU Y, ZHOU M, et al. Jointly dampening traffic oscillations and improving energy consumption with electric, connected and automated vehicles: A reinforcement learning based approach[J]. Applied Energy, 2020(257): 114030
[18]	ZHU M X, WANG Y H, PU Z Y, et al. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving[J]. Transportation Research Part C: Emerging Technologies, 2020(117): 102662.
[19]	BALAS V E, BALAS M M. Driver assisting by inverse time to collision[C]. 2006 World Automation Congress, Budapest, Hungary: IEEE, 2006.
[20]	YAO Z H, RONG H, JIANG Y S, et al. Stability and safety evaluation of mixed traffic flow with connected automated vehicles on expressways[J]. Journal of Safety Research, 2020(75): 262-274.
[21]	YAO Z H, XU T R, JIANG Y S, et al. Linear stability analysis of heterogeneous traffic flow considering degradations of connected automated vehicles and reaction time[J]. Physica A: Statistical Mechanics and Its Applications, 2021(561): 125218.
[22]	MONTANINO M, PUNZO V. Trajectory data reconstruction and simulation-based validation against macroscopic traffic patterns[J]. Transportation Research Part B: Methodological, 2015, 80: 82-106.
[23]	TREIBER M, HENNECKE A, HELBING D. Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, 2000(62): 1805-1824.

施引文献

资源附件(0)

访问统计

点击查看大图

图(13) / 表(3)

计量

文章访问数: 1012
HTML全文浏览量: 341
PDF下载量: 109
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于深度强化学习的自动驾驶车辆跟驰行为建模

doi: 10.3963/j.jssn.1674-4861.2023.02.007

作者简介:
陈越（1996—），硕士研究生. 研究方向：智能交通、自动驾驶. E-mail：chenyue_bucea@163.com

通讯作者:
焦朋朋（1980—），博士，教授. 研究方向：智能交通、交通管理、交通规划与管理、交通安全等.E-mail：jiaopengpeng@bucea.edu.cn

计量

Modeling Car Following Behavior of Autonomous Driving Vehicles Based on Deep Reinforcement Learning

计量

目录

留言板

基于深度强化学习的自动驾驶车辆跟驰行为建模

doi: 10.3963/j.jssn.1674-4861.2023.02.007

作者简介: 陈越（1996—），硕士研究生. 研究方向：智能交通、自动驾驶. E-mail：chenyue_bucea@163.com

通讯作者: 焦朋朋（1980—），博士，教授. 研究方向：智能交通、交通管理、交通规划与管理、交通安全等.E-mail：jiaopengpeng@bucea.edu.cn

计量

出版历程

Modeling Car Following Behavior of Autonomous Driving Vehicles Based on Deep Reinforcement Learning

计量

出版历程

目录

作者简介:
陈越（1996—），硕士研究生. 研究方向：智能交通、自动驾驶. E-mail：chenyue_bucea@163.com

通讯作者:
焦朋朋（1980—），博士，教授. 研究方向：智能交通、交通管理、交通规划与管理、交通安全等.E-mail：jiaopengpeng@bucea.edu.cn