基于REINFORCE算法和神经网络的无人驾驶车辆变道控制

闫浩; 刘小珠; 石英

doi:10.3963/j.jssn.1674-4861.2021.01.0019

基于REINFORCE算法和神经网络的无人驾驶车辆变道控制

doi: 10.3963/j.jssn.1674-4861.2021.01.0019

武汉理工大学自动化学院武汉 430070

基金项目:

国家自然科学基金项目 51805388

湖北省技术创新重大项目 2019AAA025

详细信息

作者简介:
闫浩(1996—)，硕士研究生.研究方向：强化学习、无人驾驶.E-mail：675462459@qq.com

通讯作者:
石英(1975—)，博士，教授.研究方向：深度学习、无人驾驶、大数据.E-mail：a_laly@163.com

中图分类号: U461.1
计量
- 文章访问数: 955
- HTML全文浏览量: 569
- PDF下载量: 38
- 被引次数: 0
出版历程
- 收稿日期: 2020-09-25
- 刊出日期: 2021-02-28

Lane-change Control for Unmanned Vehicle Based on REINFORCE Algorithm and Neural Network

School of Automation, Wuhan University of Technology, Wuhan 430070, China

摘要

摘要: 针对无人驾驶车辆变道超车场景，研究基于REINFORCE算法和神经网络技术的无人驾驶车辆变道控制策略。通过车辆动力学模型确定模型的反馈量、控制量和输出限幅要求; 设计神经网络控制器的结构，根据REINFORCE算法设计控制器训练方案; 分析经验池数据数值和方差过大的问题，提出1种经验池数据预处理的方法以改进控制器训练方案; 结合无人驾驶车辆运行场景，分析和研究强化学习过程中产生的奖励分布稀疏问题，并针对该问题提出1种基于对数函数的奖励塑造解决方案; 与PID控制器和LQR控制器进行对比实验验证。实验结果表明，与PID相比，该控制策略有更小的最大误差，变道过程更安全; 与LQR相比，该控制策略性能表现接近，以此证明其用于无人驾驶车辆变道控制任务的可行性。此外，记录在不同平台下该控制策略的执行时间以证明其实时性和在轻量级平台运行的可行性。
- 交通控制 /
- 无人驾驶车辆 /
- 变道控制 /
- 强化学习
Abstract: For lane change and overtaking of unmanned vehicles, the paper studies the lane change control strategy of unmanned vehicles based on the REINFORCE algorithm and neural network. The feedback, control input, and output limit requirement of the vehicle dynamics model are determined. The REINFORCE algorithm is used to design the structure of the neural network controller and the training plan of the controller. For too large data value and variance of the experience pool, a preprocessing method of the experience pool data is proposed to improve the controller training plan. Besides analyzing sparse reward distribution in the reinforcement learning process, a reward shaping solution based on logarithmic function is proposed combined with the running condition of unmanned vehicles. Compared with PID and LQR controllers, the experiment is carried out. The results show that the proposed control strategy has smaller maximum error compared with PID, with a safer lane-change process. The performance of the control strategy is similar to LQR, which proves its feasibility for the lane change control task of unmanned vehicles. Also, the execution time of the control strategy in different platforms is recorded to prove its real-time performance and feasibility in lightweight platforms.
- traffic control /
- unmanned vehicle /
- lane-change control /
- reinforcement learning

HTML全文

图 1 车辆单轨模型图

Figure 1. Monorail model of vehicle

下载: 全尺寸图片幻灯片

图 2 车辆变道控制系统结构图

Figure 2. Structure of vehicle lane-change control system

下载: 全尺寸图片幻灯片

图 3 强化学习过程示意图

Figure 3. Process of reinforcement learning

下载: 全尺寸图片幻灯片

图 4 “0-1”设置例1

Figure 4. "0-1"setting in case 1

下载: 全尺寸图片幻灯片

图 5 “0-1”设置例2

Figure 5. "0-1"setting in case 2

下载: 全尺寸图片幻灯片

图 6 车速为10 m/s时对照PID实验结果图

Figure 6. Experimental result compared to PID when the vehicle speed is 10 m/s

下载: 全尺寸图片幻灯片

图 7 车速为15 m/s时对照PID实验结果图

Figure 7. Experimental result compared to PID when the vehicle speed is 15 m/s

下载: 全尺寸图片幻灯片

图 8 车速为20 m/s时对照PID实验结果图

Figure 8. Experimental result compared to PID when the vehicle speed is 20 m/s

下载: 全尺寸图片幻灯片

图 9 车速为25 m/s时对照PID实验结果

Figure 9. Experimental result compared to PID when the vehicle speed is 25 m/s

下载: 全尺寸图片幻灯片

图 10 车速为10 m/s时对照LQR实验结果图

Figure 10. Experimental result compared to LQR when the vehicle speed is 10 m/s

下载: 全尺寸图片幻灯片

图 11 车速为15 m/s时对照LQR实验结果图

Figure 11. Experimental result compared to LQR when the vehicle speed is 15 m/s

下载: 全尺寸图片幻灯片

图 12 车速为20 m/s时对照LQR实验结果图

Figure 12. Experimental result compared to LQR when the vehicle speed is 20 m/s

下载: 全尺寸图片幻灯片

图 13 车速为25 m/s时对照LQR实验结果图

Figure 13. Experimental result compared to LQR when the vehicle speed is 25 m/s

下载: 全尺寸图片幻灯片

表 1 车辆固定参数表

Table 1. Fixed parameters of vehicle

固定参数	数值
s_f	0.2
s_r	0.2
a	1.232
b	1.468
C_cf	66 900
C_cr	62 700
C_lf	66 900
C_lr	62 700
m	1 723
I_z	4 175

下载: 导出CSV

表 2 神经网络参数表

Table 2. Parameters of the neural network

	第1层	第2层
输入维度	5	200
输出维度	200	51
激活函数	tanh	无

下载: 导出CSV

表 3 变道完成后误差和变道过程中最大误差记录表

Table 3. Errors after lane change and the maximum error during lane change

车速和控制器	变道完成后误差/m	变道过程中最大误差/m
10 m/s，REINFORCE	0.02	0.06
10 m/s，PID	0	0.17
10 m/s，LQR	0	0.02
15 m/s，REINFORCE	0.04	0.07
15 m/s，PID	0	0.17
15 m/s，LQR	0	0.05
20 m/s，REINFORCE	0.06	0.07
20 m/s，PID	0	0.17
20 m/s，LQR	0	0.12
25 m/s，REINFORCE	0.08	0.10
25 m/s，PID	0	0.17
25 m/s，LQR	0	0.19

下载: 导出CSV

表 4 神经网络控制器运行时间记录表

Table 4. Running time of the neural-network controller

平台	仿真总用时/s	仿真总步数	单步平均用时/s
计算机	2.834 99	1 202	0.002 36
TX2	3.898 25	1 202	0.003 24
Jetson nano	4.859 62	1 202	0.004 04

下载: 导出CSV

参考文献(19)

[1]	AHN S, CASSIDY M J. Freeway traffic oscillations and vehicle lane change Maneuvers[C]. 17^th International Sympo-sium on Transportation & Traffic Theory, London: Elsevier, 2007.
[2]	邱少林, 钱立军, 陆建辉. 基于最优预瞄的智能车变道控制[J]. 中国机械工程, 2019, 30(23): 2778-2783. doi: 10.3969/j.issn.1004-132X.2019.23.002 QIU Shaolin, QIAN Lijun, LU Jianhui. Lane-change control for intelligent vehicles based on optimal preview[J]. China Mechanical Engineering, 2019, 30(23): 2778-2783. (in Chinese) doi: 10.3969/j.issn.1004-132X.2019.23.002
[3]	林小宁, 顾筠, 沈峘. 车辆自主快速变道的轨迹规划与跟踪控制[J]. 兰州理工大学学报, 2017, 43(6): 108-112. doi: 10.3969/j.issn.1673-5196.2017.06.021 LIN Xiaoning, GU Jun, SHEN Huan. Trajectory planning and follow up controling of vehicle autonomous fast lane change[J]. Journal of Lanzhou University of Technology, 2017, 43(6): 108-112. (in Chinese) doi: 10.3969/j.issn.1673-5196.2017.06.021
[4]	PENG Tao, SU Lili, ZHANG Ronghui. A new safe lane-change trajectory model and collision avoidance control method for automatic driving vehicles[J]. Expert Systems with Applications, 2019, 141: 112953. http://www.sciencedirect.com/science/article/pii/S0957417419306712
[5]	HU Jianjun, XIONG Songsong, ZHA Junlin, FU Chunyun. Lane detection and trajectory tracking control of autonomous vehicle based on model predictive control[J]. International Journal of Automotive Technology, 2020, 20(2): 285-295. doi: 10.1007/s12239-020-0027-6
[6]	WU Xiaodong, QIAO Bangjun, SU Chengrui. Trajectory planning with time-variant safety margin for autonomous vehicle lane change[J]. Applied Sciences-Basel, 2020, 10(5): 16-26. http://www.researchgate.net/publication/339622745_Trajectory_Planning_with_Time-Variant_Safety_Margin_for_Autonomous_Vehicle_Lane_Change
[7]	聂枝根, 王万琼, 赵伟强, 等. 基于轨迹预瞄的智能汽车变道动态轨迹规划与跟踪控制[J]. 交通运输工程学报, 2020, 20(2): 147-160. https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC202002012.htm NIE Zhigen, WANG Wanqiong, ZHAO Weiqiang, et al. Dynamic trajectory planning and tracking control for lane change of intelligent vehicle based on trajectory preview[J]. Journal of Traffic and Transportation Engineering, 2020, 20(2): 147-160. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC202002012.htm
[8]	蔡英凤, 秦顺琪, 臧勇, 等. 基于可拓优度评价的智能汽车横向轨迹跟踪控制方法[J]. 汽车工程, 2019, 41(10): 1189-1196. https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC201910012.htm CAI Yingfeng, QIN Shunqi, ZHANG Yong, et al. Lateral trajectory tracking control scheme for intelligent vehicle based on extension goodness evaluation[J]. Automotive Engineering, 2019, 41(10): 1189-1196. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC201910012.htm
[9]	白成盼, 惠飞, 景首才. 基于微分平坦与MPC的智能车换道控制算法[J]. 计算机技术与发展, 2020, 30(5): 16-20. doi: 10.3969/j.issn.1673-629X.2020.05.004 BAI Chengpan, GU Fei, JING Shoucai. Intelligent car lane change control algorithm based on differential flatness and MPC[J]. Computer Technology and Development, 2020, 30(5): 16-20. (in Chinese) doi: 10.3969/j.issn.1673-629X.2020.05.004
[10]	刘洋. 智能车辆高速公路自动变道轨迹规划与控制研究[D]. 长春: 吉林大学, 2019. LIU Yang. Research on the trajectory planning and control for automatic lane change of intelligent vehicles on highway[D]. Changchun: Jilin University, 2019. (in Chinese)
[11]	张家旭, 施正堂, 赵健, 等. 基于Radau伪谱法的汽车高速紧急换道避障最优控制策略设计[J]. 汽车工程, 2020, 42 (8): 1040-1049. https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202008008.htm ZHANG Jiaxu, SHI Zhengtang, ZHAO Jian, et al. Optimal control strategy design for vehicle high-speed emergency lane change collision avoidance based on Radau pseudospectral method[J]. Auto-motive Engineering, 2020, 42(8): 1040-1049. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202008008.htm
[12]	任彧, 赵师涛. 磁导航AGV深度强化学习路径跟踪控制方法[J]. 杭州电子科技大学学报(自然科学版), 2019, 39(2): 28-34. https://www.cnki.com.cn/Article/CJFDTOTAL-HXDY201902006.htm REN Yu, ZHAO Shitao. Deep reinforcement learning based path following control of magnetic navigation AGV[J]. Journal of Hangzhou Dianzi University(Natural Sciences), 2019, 39(2): 28-34. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-HXDY201902006.htm
[13]	赵师涛. 基于强化学习的磁导航AGV控制方法研究[D]. 杭州: 杭州电子科技大学, 2018. ZHAO Shitao. Research on Reinforcement Learning based control method of magnetic navigation AGV[D]. Hangzhou: Hangzhou Dianzi University, 2018. (in Chinese)
[14]	ANDREAS B, ANASTASIOS M. Straightpath following for underactuated Marine vessels using deep Reinforcement Learning[J]. IFAC-Papers OnLine, 2018, 51(29): 329-334. doi: 10.1016/j.ifacol.2018.09.502
[15]	WANG Shuti, YING Xunhe, LI Peng, et al. Trajectory tracking control for mobile robots using reinforcement learning and PID[J]. Iranian Journal of Science and Technology Transations of Electrcal Engineering, 2020, 44(2): 1031-1041. doi: 10.1007/s40998-020-00311-x
[16]	PACEJKA H B. Tyre and vehicle dynamics[M]. 2nd Ed. Burlington: butter-worth-heinemann, 2006.
[17]	龚建伟, 姜岩, 徐威. 无人驾驶车辆模型预测控制[M]. 北京: 北京理工大学出版社, 2014. GONG Jianwei, JIANG Yan, Xu Wei. Model predictive control for self-driving vehicles[M]. Beijing: Beijing Institute of Techno- logy Press. (in Chinese)
[18]	理查德·萨顿, 安德鲁·巴图. 强化学习[M]. 2版. 北京: 电子工业出版社, 2019. RICHARD S. Sutton, ANDREW G. Barto. Reinforcement Learning: an introducetion[M]. 2ed. Beijing: Electronic Industry Press, 2019. (in Chinese)
[19]	中华人民共和国住房和城乡建设部. 城市快速路设计规程: CJJ 129—2009[S]. 北京: 中国建筑工业出版社, 2009. Ministry of Housing and Urban-Rural Development of the People's Republic of China. Specification for design of urban expressway: CJJ 129—2009[S]. Beijing: China Architecture & Building Press, 2009. (in Chinese)