基于ADASYN-XGBoost的交通事故自动检测方法

陈俊宇; 李金龙; 许伦辉; 吴攀; 林永杰

doi:10.3963/j.jssn.1674-4861.2023.03.002

基于ADASYN-XGBoost的交通事故自动检测方法

doi: 10.3963/j.jssn.1674-4861.2023.03.002

陈俊宇^1,,
李金龙^1, ,,
许伦辉^{1, 2},
吴攀¹,
林永杰¹

1.
华南理工大学土木与交通学院广州 510641
2.
广东科技学院计算机学院广东东莞 510812

基金项目:

国家自然科学基金项目 52072130

详细信息

作者简介:
陈俊宇(1997—)，硕士研究生. 研究方向：交通安全与数据挖掘. E-mail: 1113913426@qq.com

通讯作者:
李金龙(1993—)，博士研究生. 研究方向：时空数据建模与交通信号控制. E-mail: 202010101569@mail.scut.edu.cn

中图分类号: U491.3
计量
- 文章访问数: 771
- HTML全文浏览量: 403
- PDF下载量: 44
- 被引次数: 0
出版历程
- 收稿日期: 2022-09-22
- 网络出版日期: 2023-09-16

An Automatic Detection Method for Traffic Accidents Based on ADASYN-XGBoost

CHEN Junyu^1
,,
LI Jinlong^{1
, ,},
XU Lunhui^{1, 2},
WU Pan¹,
LIN Yongjie¹

1.
School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510641, China
2.
IT Academy, Guangdong University of Science and Technology, Dongguan 510812, Guangdong, China

摘要

摘要: 基于数据驱动的交通事故自动检测对道路事故的及时救援与降低事故影响具有重要作用。为解决道路交通事故自动检测中的样本不均衡问题，研究了混合自适应过采样技术与极限梯度提升树算法的交通事故自动检测方法（ADASYN-XGBoost）。其中，为从不均衡的交通事故样本中有效挖掘数据的时空特征与事故发生之间的内在关联规律，构建了初始特征变量组合，引入自适应合成过采样方法（adaptive synthetic oversampling method，ADASYN）来平衡事故类与非事故类的样本数量，以增强训练数据的质量；其次，为提高检测效果，构建了基于XGBoost的交通事故检测模型，利用该模型对增强后的数据样本进行特征筛选；最后，为获取最佳参数组合，采用了贝叶斯优化算法对XGBoost进行参数的快速标定。本文使用波特兰高速公路数据集对ADASYN-XGBoost方法进行模型验证与实证研究。结果表明：与先进的基准模型相比，ADASYN-XGBoost的各项检测指标均最优，其F1分数达到94.47%且误检率低至8.95%。在模型训练样本数为2800，500（18%的初始样本量），150（5%的初始样本量）时，ADASYN-XGBoost的F1分数分别为94.47%，88.89%，81.93%。在进一步的消融实验中，均衡正负样本后各基准模型的性能指标提高了2.68%~44.85%。本文提出的方法能够有效解决道路交通事故检测中的样本不均衡问题，同时也为道路交通安全预防与事故处理等提供了技术保障。
- 智能交通 /
- 交通事故自动检测 /
- 样本不均衡 /
- 自适应过采样技术 /
- 极限梯度提升树算法
Abstract: A data-driven approach for automatic detection of road traffic accidents plays an important role in timely rescue and reducing the impact of road accidents. In order to solve the sample imbalance problem in automatic detection of traffic accidents a hybrid adaptive oversampling technique and extreme gradient boosting tree algorithm (ADASYN-XGBoost) is studied. In particular, to effectively mine the intrinsic correlation law between spatio-temporal feature of the data and accident occurrence form the unbalanced traffic accident samples. The initial combinations of feature variable are set. And to improve the quality of the training data, the adaptive synthetic oversampling method (ADASYN) is introduced to balance the number of samples between the accident class and the non-accident class. To improving the detection effect, a traffic accident detection model based on extreme gradient boosting (XGBoost) is developed, which is utilized to filter the features of the enhanced data samples. Finally, to obtain the best combination of parameters, a Bayesian optimization algorithm is used to quickly calibrate the parameters of XGBoost. In this paper, the ADASYN-XGBoost method is validated and investigated using the Portland Freeway dataset. The results show that ADASYN-XGBoost optimizes all detection metrics compared to the state-of-the-art benchmark model. The F1 score reaches 94.47% and the false detection rate is as low as 8.95%. The F1 scores of ADASYN-XGBoost are 94.47%, 88.89%, and 81.93% when the number of model training samples are 2800, 500 (18% of the initial sample size), and 150 (5% of the initial sample size). In further ablation experiments, the performance indexes of each benchmark model after equalizing positive and negative samples are improved by 2.68% to 44.85%. The method proposed in this paper can effectively solve the sample imbalance problem in detection of road traffic accidents, which also provides technical support for road traffic safety prevention and accident management.
- intelligent transportation /
- automatic detection of road traffic accidents /
- sample imbalance /
- adaptive synthetic sampling technique /
- extreme gradient boosting tree algorithm

HTML全文

图 1 交通事故与检测器的位置分布

Figure 1. Location distribution of traffic accidents and detectors

下载: 全尺寸图片幻灯片

图 2 ADASYN-XGBoost交通事故检测框架

Figure 2. ADASYN-XGBoost automatic traffic accident detection framework

下载: 全尺寸图片幻灯片

图 3 不同训练样本数量时各模型的性能热力图

Figure 3. Heatmap of model performance with different training sample sizes

下载: 全尺寸图片幻灯片

图 4 原始数据与生成数据对比箱形图

Figure 4. Boxplots comparing original data with generated data

下载: 全尺寸图片幻灯片

图 5 特征重要性排序

Figure 5. Features importance ranking

下载: 全尺寸图片幻灯片

表 1 样本初始特征变量集及表示方法

Table 1. Initial feature variable set and its representation method

特征变量			表示符号	序号
时间	位置	参数	表示符号	序号
事故发生1 min前	上游检测器	交通流/交通速度/占有率	b1_up_vol/sped/ocup	1/2/3
事故发生1 min前	下游检测器	交通流/交通速度/占有率	b1_dn_vol/sped/ocup	4/5/6
事故发生2 min前	上游检测器	交通流/交通速度/占有率	b2_up_vol/sped/ocup	7/8/9
事故发生2 min前	下游检测器	交通流/交通速度/占有率	b2_dn_vol/sped/ocup	10/11/12
事故发生3 min前	上游检测器	交通流/交通速度/占有率	b3_up_vol/sped/ocup	13/14/15
事故发生3 min前	下游检测器	交通流/交通速度/占有率	b3_dn_vol/sped/ocup	16/17/18
事故发生1 min后	上游检测器	交通流/交通速度/占有率	b1_up_vol/sped/ocup	19/20/21
事故发生1 min后	下游检测器	交通流/交通速度/占有率	a1_dn_vol/sped/ocup	22/23/24
事故发生2 min后	上游检测器	交通流/交通速度/占有率	a2_up_vol/sped/ocup	25/26/27
事故发生2 min后	下游检测器	交通流/交通速度/占有率	a2_dn_vol/sped/ocup	28/29/30
事故发生3 min后	上游检测器	交通流/交通速度/占有率	a3_up_vol/sped/ocup	31/32/33
事故发生3 min后	下游检测器	交通流/交通速度/占有率	a3_dn_vol/sped/ocup	34/35/36
事故发生时刻	上游检测器	交通流/交通速度/占有率	now_up_vol/sped/ocup	37/38/39
事故发生时刻	下游检测器	交通流/交通速度/占有率	now_dn_vol/sped/ocup	40/41/42
事故发生时刻	上游检测器	交通流/交通速度/占有率预测值	pred_up_vol/sped/ocup	43/44/45
事故发生时刻	下游检测器	交通流/交通速度/占有率预测值	pred_dn_vol/sped/ocup	46/47/48
事故发生时刻	上游、下游	交通流/交通速度/占有率的差值	up_dn_vol/sped/ocup	49/50/51
事故发生时刻	上游检测器	3参数预测值与检测值的差值	up_now_pred_vol/sped/ocup	52/53/54
事故发生时刻	下游检测器	3参数预测值与检测值的差值	dn now pred vol/sped/ocup	55/56/57

下载: 导出CSV

表 2 超参数调优说明

Table 2. Hyperparameter tuning instructions

方法	调优过程及结果
LR	LR模型无超参数
SVM	核函数采用高斯核
RF	子估计器数量为100，有放回采样
RSKNN	RSKNN采用100个K近邻数为5的KNN为基学习器，子空间最大采样率为1.0，采用有放回采样；KNN难以处理高维特征，每个样本仅使用前5个重要特征
BPNN	网络结构为搭配ReLU激活函数的2层神经网络，采用2个隐藏层单元数为30的线性层; 采用交叉熵损失函数.Adam优化器、学习率0.01和200个epochs
E-SVM-KNN	SVM采用高斯核函数; KNN近邻数为5
FA-WRF	因子分析(factor analysis, FA)提取7个特征, WRF的子空间维度为3, 子估计器数量为100
SASYNO-RF-RSKNN	RF仅用于提取重要特征，无重要超参数RSKNN超参数设置同上
ADASYN-XGBoost	通过贝叶斯方法进行超参数搜索，XGBoost的最优参数组合为：树最大深度为6，学习率为0.06, 最小子叶权重为2.46, 正则化系数(gamma)为0.125, 子采样率(subsample)为0.79。面向小样本情形时，仅使用前8个重要特征

下载: 导出CSV

表 3 数据增强对不同模型的性能提升对比

Table 3. Comparison of the performance improvement of different models by data augmentation

模型	正负样本比	A_ACC /%	P_precision /%	D_DR /%	F_FDR /%	M_MCC/%	F1/%	F1提升比/%
LR	1：8.62	94.33	100.00	45.63	54.37	65.36	62.40	44.85
LR	1：1.05	97.93	92.14	88.19	11.82	88.62	89.83
SVM	1：8.62	97.95	100.00	80.35	28.26	79.84	88.66	4.96
SVM	1：1.05	98.60	97.97	88.37	11.63	92.23	92.76
RF	1：8.62	95.68	97.53	60.42	39.58	74.67	74.12	13.72
RF	1：1.05	97.09	97.91	73.78	26.22	83.47	83.87
RSKNN	1：8.62	94.24	100.00	52.08	68.59	69.91	68.50	22.72
RSKNN	1：1.05	97.49	97.50	81.25	18.75	87.70	88.64
BPNN	1：8.62	97.84	98.74	79.93	20.07	87.65	88.08	7.20
BPNN	1：1.05	98.84	97.92	90.55	9.45	93.53	94.07
XGBoost	1：8.62	98.45	97.53	87.20	12.80	91.34	91.96	2.83
XGBoost	1：1.05	98.95	98.60	91.05	8.95	94.10	94.47

下载: 导出CSV

表 4 以不同特征数为输入的模型性能

Table 4. Model performance with different number of features as input

特征数	A_ACC	P_precision	D_DR	F_FDR	M_MCC	F1
57	0.979 9	0.942 9	0.846 2	0.153 9	0.882 4	0.891 9
55	0.974 9	0.914 3	0.820 5	0.179 5	0.852 6	0.864 9
50	0.977 4	0.894 7	0.871 8	0.128 2	0.870 7	0.883 1
45	0.979 9	0.942 9	0.846 2	0.153 9	0.882 4	0.891 9
40	0.982 4	0.970 6	0.846 2	0.153 9	0.897 0	0.904 1
35	0.982 4	0.944 4	0.871 8	0.128 2	0.897 9	0.906 6
30	0.989 5	0.986 0	0.910 5	0.089 5	0.941 0	0.944 7
25	0.984 9	0.946 0	0.897 4	0.102 6	0.913 1	0.921 1
20	0.982 4	0.944 0	0.871 8	0.128 2	0.897 9	0.906 7

下载: 导出CSV

参考文献(20)

[1]	赵超, 谢天, 辛国容, 等. 基于Seq2Seq自编码器模型的交通事故实时检测与评价[J]. 控制与决策, 2022, 37(8): 2141-2148. https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202208026.htm ZHAO C, XIE T, XIN G R, et al, Real-time traffic accident detection and evaluation based on Seq2Seq and auto-encode model[J]. Control and Decision, 2022, 37(8): 2141-2148. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202208026.htm
[2]	CHEN J Y, WU P, LI J L, et al. More robust and better: Automatic traffic incident detection based on XGBoost[C]. 5th International Symposium on Traffic Transportation and Civil Architecture, Suzhou, China: CRC Press, 2023.
[3]	李红伟, 姜桂艳, 李素兰, 等. 基于突变强度的交通事件自动检测算法[J]. 交通运输系统工程与信息, 2019, 19(5): 59-65. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201905009.htm LI H W, JIANG G Y, LI S L, et al. An automatic incident detection algorithm based on mutation strength[J]. Journal of Transportation Systems Engineering and Information Technology, 2019, 19(5): 59-65. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201905009.htm
[4]	龙琼, 胡列格, 张谨帆, 等. 基于尖点突变理论模型的交通事故检测[J]. 土木工程学报, 2015, 48(9): 112-116. https://www.cnki.com.cn/Article/CJFDTOTAL-TMGC201509017.htm LONG Q, HU L G, ZHANG J F, et al. Traffic incident detection based on the cusp catastrophe theory model[J]. China Civil Engineering Journal, 2015, 48(9): 112-116. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-TMGC201509017.htm
[5]	尹春娥, 陈宽民, 万继志. 基于小波方程的高速公路交通事故自动检测方法[J]. 中国公路学报, 2014, 27(12): 106-112. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201412018.htm YIN C E, CHEN K M, WAN J Z. Automatic detection method for expressway traffic accidents based on wavelet equation[J] China Journal of Highway and Transport, 2014, 27 (12): 106-112. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201412018.htm
[6]	LI J L, SUN L J, LI Y S, et al. Rapid prediction of acid detergent fiber content in corn stover based on NIR-spectroscopy technology[J]. Optik, 2019(180): 34-45.
[7]	CHEU R L, RITCHIE S G. Automated detection of lane-blocking freeway incidents using artificial neural networks[J]. Transportation Research Part C: Emerging Technologies, 1995, 3(6): 371-388. doi: 10.1016/0968-090X(95)00016-C
[8]	ISHAK S, AL-DEEK H. Performance of automatic ANN-based incident detection on freeways[J]. Journal of Transportation Engineering, 1999, 125(4): 281-290. doi: 10.1061/(ASCE)0733-947X(1999)125:4(281)
[9]	SRINIVASAN D, JIN X, CHEU R L. Adaptive neural network models for automatic incident detection on freeways[J]. Neurocomputing, 2005(64): 473-496.
[10]	YUAN F, CHEU R L. Incident detection using support vector machines[J]. Transportation Research Part C: Emerging Technologies, 2003, 11(3-4): 309-328.
[11]	LIU Q, LU J, CHEN S, et al. Multiple Naïve bayes classifiers ensemble for traffic incident detection[J]. Mathematical Problems in Engineering, 2014(16): 383671.
[12]	XIAO J. SVM and KNN ensemble learning for traffic incident detection[J]. Physica A: Statistical Mechanics and its Applications, 2019(517): 29-35.
[13]	JIANG H, DENG H. Traffic incident detection method based on factor analysis and weighted random forest[J]. IEEE Access, 2020(8): 168394-168404.
[14]	DOGRU N, SUBASI A. Traffic accident detection using random forest classifier[C]. 15^th Learning and Technology Conference(L&T), Jeddah, Saudi Arabia: IEEE, 2018.
[15]	PARSA A B, TAGHIPOUR H, DERRIBLE S, et al. Real-time accident detection: coping with imbalanced data[J]. Accident Analysis & Prevention, 2019(129): 202-210.
[16]	CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002(16): 321-357.
[17]	XIE T, SHANG Q, YU Y. Automated traffic incident detection: Coping with imbalanced and small datasets[J]. IEEE Access, 2022(10): 35521-35540.
[18]	HE H, BAI Y, GARCIA E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China: IEEE, 2008.
[19]	CHEN T, GUESTRIN C. Xgboost: A scalable tree boosting system[C]. The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA: ACM, 2016.
[20]	肖宇, 赵建有, 叱干都, 等. 基于XGBoost的短时出租车速度预测模型[J]. 交通信息与安全, 2022, 40(3): 163-170. doi: 10.3963/j.jssn.1674-4861.2022.03.017 XIAO Y, ZHAO J Y, CHI G D, et al. A short-term prediction model for taxi speed based on XGBoost[J] Journal of Transport Information and Safety, 2022, 40(3): 163-170. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.03.017