基于ST-AGCN算法的物流暴力分拣识别模型

曹菁菁; 余宙; 李鹏飞; 闵艳萍; 黄齐贤; 赵强伟

doi:10.3963/j.jssn.1674-4861.2023.05.012

基于ST-AGCN算法的物流暴力分拣识别模型

doi: 10.3963/j.jssn.1674-4861.2023.05.012

武汉理工大学交通与物流工程学院武汉 430063

基金项目:

国家自然科学基金青年项目 61502360

详细信息

通讯作者:
曹菁菁（1984—），博士，副教授. 研究方向：机器学习和模式识别.E-mail：bettycao@whut.edu.cn

中图分类号: U495
计量
- 文章访问数: 325
- HTML全文浏览量: 251
- PDF下载量: 15
- 被引次数: 0
出版历程
- 收稿日期: 2022-09-04
- 网络出版日期: 2024-01-18

A Recognition Model for Violent Sorting Activity Based on the ST-AGCN Algorithm

School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China

摘要

摘要: 目前快递物流行业普遍存在分拣人员暴力分拣现象，为减少此类行为可采用基于图像的行为识别方法，但这种方法在实际场景中存在算法鲁棒性差、人体关节点数据难获取等问题。针对上述问题，制作了1个物流暴力分拣行为视频数据集，研究了暴力分拣行为识别模型。通过树莓派采集室内外2种情景下的分拣视频数据，利用Python socket模块实现视频图像实时传输，采用切片筛选规则除去非标准数据，应用OpenPose模型获取关节点数据。针对一般人体行为识别网络模型无法较好反映暴力分拣关节点对动作重要影响程度的问题，研究了以ST-GCN为主干网络的优化图神经网络模型ST-AGCN。利用空间注意力机制学习不同关节点对于各种动作的影响，以更新各关节点的权重；通过增加自适应图结构层以端到端学习方式将人体骨骼图的拓扑结构与网络参数共同优化，突出关联度高的关节点对动作识别的影响。以室内外环境下暴力分拣视频为对象开展和多种深度学习模型的对比实验和消融实验，实验结果表明：ST-AGCN模型识别现实场景中暴力分拣行为的准确率相比ST-GCN、STA-LSTM、不含空间注意力机制的ST-AGCN和不含自适应图结构层的ST-AGCN模型分别提高了5.6%，13.82%，2.36%，1.61%，且适用于室内外环境杂乱、局部遮挡等复杂的物流分拣场景，验证了ST-AGCN的优越性以及空间注意力机制和自适应图结构层的有效性。
- 智能物流 /
- 暴力分拣 /
- 时空图卷积网络模型 /
- 自适应图结构层 /
- 人体行为识别
Abstract: An image-based behavior recognition method can be utilized to address the issue of violent sorting which is prevalent within the express logistics industry. However, this method presents challenges including algorithmic fragility and the difficulty in obtaining joint point data in practical scenarios. In response to these challenges, a video dataset is generated to capture instances of violent sorting behaviors in logistics, and a model is developed to identify such behaviors. Video data from both indoor and outdoor scenarios is collected, with real-time video image transmission achieved using the Python socket module. Screening rules are applied to eliminate non-standard data, and the OpenPose model is employed to obtain joint data. To address the limitation of general recognition network in reflecting the impact of joint points on actions, an optimized graph neural network is developed based on ST-GCN. The spatial attention mechanism is used to understand the influence of different joints on various movements, updating the weight of each joint. The topology and network parameters of the human bone map are optimized through end-to-end learning to emphasize the influence of key joints on action recognition. Comparative and ablation experiments are conducted on various deep learning models using violent sorting videos captured in indoor and outdoor environments. The experimental results indicate that the accuracy of ST-AGCN model for identifying violent sorting behavior in real scenes is 5.6% higher than ST-GCN. Compared with STA-LSTM, ST-AGCN without spatial attention mechanism, and ST-AGCN without the adaptive graph structure layer, the accuracy of ST-AGCN model is improved by 13.82%, 2.36%, and 1.61% respectively, which indicates the ST-AGCN model is also suitable for complex logistics sorting scenes in cluttered indoor and outdoor environments and partial occlusion, and verifies the superiority of ST-AGCN and the effectiveness of the spatial attention mechanism and the adaptive graph structure layer.
- Intelligent logistics /
- violent sorting /
- spatial temporal graph convolutional networks /
- adaptive graph structure layer /
- human activity recognition

HTML全文

图 1 摄像头摆放位置

Figure 1. Position of the camera placement

下载: 全尺寸图片幻灯片

图 2 数据传输流程

Figure 2. Data transmission process

下载: 全尺寸图片幻灯片

图 3 OpenPose处理采集数据

Figure 3. OpenPose processes the acquisition data

下载: 全尺寸图片幻灯片

图 4 人体关节点

Figure 4. Human joint point

下载: 全尺寸图片幻灯片

图 5 ST-AGCN网络结构图

Figure 5. ST-AGCN network structure

下载: 全尺寸图片幻灯片

图 6 SAGCN网络结构图

Figure 6. SAGCN network structure

下载: 全尺寸图片幻灯片

图 7 对比实验结果

Figure 7. Results of the comparison experiment

下载: 全尺寸图片幻灯片

图 8 Attention机制消融实验结果

Figure 8. Results of the Attention mechanism ablation experiment

下载: 全尺寸图片幻灯片

图 9 向心子集和离心子集的邻接矩阵

Figure 9. Adjacency matrices for the centripetal and centrifugal subsets

下载: 全尺寸图片幻灯片

图 10 自适应图结构消融实验结果

Figure 10. Results of the adaptive graph ablation experiment

下载: 全尺寸图片幻灯片

图 11 实际场景测试结果

Figure 11. Field test results

下载: 全尺寸图片幻灯片

表 1 室外暴力分拣场景及对应视频数量

Table 1. Outdoor violence sorting scene and the corresponding number of videos 单位: 个

场景	环境杂乱	局部遮挡	拍摄不全	在面包车中
单人	10	10	10	13
双人	10	10	10	13
三人	10	10	10	13

下载: 导出CSV

表 2 室内暴力分拣场景及对应视频数量

Table 2. Indoor violence sorting scene and the corresponding number of videos单位: 个

场景	光线不足	环境杂乱	局部遮挡	拍摄不全
单人	10	10	13	10
双人	10	10	13	10
三人	10	10	13	10

下载: 导出CSV

表 3 各类动作视频片段数量

Table 3. Number of action video clips of all kinds

动作类型	视频数量/个
正常	490
摔	241
踢	279
砸	272
丢	540

下载: 导出CSV

表 4 对比实验结果

Table 4. Results of the comparison experiment

模型类别	准确率/%
STA-LSTM	44.44
ST-GCN	52.66
Shift-GCN	57.22
2s-AGCN	56.46
ST-AGCN	58.26

下载: 导出CSV

表 5 Attention机制消融实验结果

Table 5. Results of the Attention mechanism ablation experiment

模型类别	准确率/%	平均拒识率/%
ST-AGCN w/o SA	55.90	12.03
ST-AGCN	58.26	10.67

下载: 导出CSV

表 6 自适应图结构消融实验结果

Table 6. Results of the adaptive graph ablation experiment

模型类别	准确率/%	平均拒识率/%
ST-AGCN w/o adaptive graph	56.65	11.61
ST-AGCN	58.26	10.67

下载: 导出CSV

表 7 单元堆叠数目消融实验结果

Table 7. Results of the unit stack number ablation experiment

ST-AGCN层数	准确率/%	平均拒识率/%	时间/s
1	40.36	24.38	1 312
3	45.10	19.70	2 150
5	52.66	14.65	4 037
7	56.46	11.71	6 808
10	58.26	10.67	8 632
12	52.18	14.31	10 550

下载: 导出CSV

表 8 现场测试的误识率和拒识率

Table 8. Misidentification rate and rejection rate of field tests单位: %

动作类型	误识率	拒识率
丢	16.17	12.45
踢	19.00	12.99
正常	21.82	8.61
砸	19.94	12.92
抛	23.08	6.37

下载: 导出CSV

参考文献(30)

[1]	顾欣. 高速公路互通立交合流区交通冲突预测模型研究[D]. 南京: 东南大学, 2022. GU X. Research on prediction model of traffic conflict in the confluence area of expressway interchange[D]. Nanjing: Southeast University, 2022. (in Chinese)
[2]	谭志荣, 陈维, 王辉, 等. 基于视频识别技术的船舶视觉盲区增强方法研究[J]. 中国水运, 2020, 12(2): 108-109. TAN Z R, CHEN W, WANG H, et al. Research on the enhancement method of ship visual blind spot based on video recognition technology[J]. China Water Transport, 2020, 12 (2): 108-109. (in Chinese)
[3]	丁奥, 张媛, 朱磊, 等. 基于加速度分布特征的快递暴力分拣识别方法[J]. 包装工程, 2020, 41(23): 162-171. DING A, ZHANG Y, ZHU L, et al. Recognition method for rough handling of express parcels based on acceleration distribution features[J]. Packaging Engineering, 2020, 41(23): 162-171. (in Chinese)
[4]	徐燕, 刘军, 周丽, 等. 防暴力分拣的主动式快递分拣作业辅助和评价系统及方法: CN201910942514.1[P]. 2019-12-06. XU Y, LIU J, ZHOU L, et al. Active express sorting operation assistance and evaluation system and method for anti-violent sorting: CN201910942514. 1[P]. 2019-12-06. (in Chinese)
[5]	范洪博, 郭全, 张晶, 等. 1种基于LoRa的防暴力分拣防丢的物流实时监控装置: CN201820075504. 3[P]. 2018-10-12. FAN H B, GUO Q, ZHANG J, et al. A new real-time logistics monitoring device based on LoRa: CN201820075504.3[P]. 2018-10-12. (in Chinese)
[6]	尚淑玲. 基于计算机视觉的物流暴力分拣行为识别[J]. 计算机仿真, 2013, 30(12): 430-433. SHANG S L. Identification of logistics violence sorting behavior based on computer vision[J]. Computer Simulation, 2013, 30(12): 430-433. (in Chinese)
[7]	罗雪阳, 蔡锦达. 基于深度学习的图像分类算法框架研究[J]. 包装工程, 2021, 42(21): 181-187. LUO X Y, CAI J D. Research on image classification algorithm framework based on deep learning[J]. Packaging Engineering, 2021, 42(21): 181-187. (in Chinese)
[8]	DU Y, WANG W, WANG L. Hierarchical recurrent neural network for skeleton based action recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Boston, MA, USA: IEEE, 2015.
[9]	FENG J G, ZHANG S Y, XIAO J. Explorations of skeleton features for LSTM-based action recognition[J]. Multimed Tools Appl, 2017, 78(1): 591-603.
[10]	ZHU A C, WU Q Y, CUI R, et al. Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN[J]. Neurocomputing, 2020(5): 90-100.
[11]	DING Y K, ZHU Y L, WU Y R, et al. Spatio-temporal attention LSTM model for flood forecasting[C]. IEEE Green Computing and Communications and IEEE Cyber, Physical and Social Computing and IEEE Smart Data, Atlanta, GA, USA: IEEE, 2019.
[12]	LIU J, SHAHROUDY A, XU D, et al. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 40(12): 3007-3021.
[13]	YANG J Y, LIU W, YUAN J S, et al. Hierarchical soft quantization for skeleton-based human action recognition[J]. IEEE Transactions on Multimedia, 2020, 23: 883-898.
[14]	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv, 2016, 1609, 2907.
[15]	SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA: IEEE, 2019.
[16]	YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. 32^nd AAAI Conference on Artificial Intelligence, New Orleans, Lousiana, USA: AAAI, 2018.
[17]	CHENG K, ZHANG Y F, HE X Y, et al. Skeleton-based action recognition with shift graph convolutional network[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA: IEEE, 2020.
[18]	翟龙真. 基于人因工程学的快递分拣作业优化研究[D]. 长沙: 南华大学, 2016. ZHAI L Z. Research on optimization of express sorting operation based on human factors engineering[D]. Changsha: University of South China, 2016. (in Chinese)
[19]	刘星余. 面向物流仓储分拣机器人的多目标视觉识别与定位方法研究[J]. 粘接, 2021, 47(7): 109-112. LIU X Y. Research on multi-objective visual recognition and positioning method for logistics warehousing and sorting robot[J]. Adhesion, 2021, 47(7): 109-112. (in Chinese)
[20]	李龙棋, 方美发, 唐晓腾. 树莓派平台下的实时监控系统开发[J]. 闽江学院学报, 2014, 35(5): 67-72. LI L Q, FANG M F, TANG X T. Development of real-time monitoring system based on raspberry pie platform[J]. Journal of Minjiang University, 2014, 35(5): 67-72. (in Chinese)
[21]	CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE, 2017, 143.
[22]	刘勇, 李杰, 张建林, 等. 基于深度学习的二维人体姿态估计研究进展[J]. 计算机工程, 2021, 47(3): 1-16. LIU Y, LI J, ZHANG J L, et al. Research progress of 2D human pose estimation on deep learning[J]. Computer Engineering, 2021, 47(3): 1-16. (in Chinese)
[23]	WEBERING F, BLUME H, ALLAHAM I. Markerless camera-based vertical jump height measurement using OpenPose[C]. The IEEE/ CVF Conference on Computer Vision and Pattern Recognition, Online: IEEE, 2021.
[24]	SAHIN I, MODI A, KOKKONI E. Evaluation of OpenPose for quantifying infant reaching motion[J]. Arch Phys Med Rehabil, 2021, 102(10): e86.
[25]	PAPANDREOU G, ZHU T, CHEN L C, et al. PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[C]. Computer Vision -European Conference on Computer Vision, Munich, Germany: ECCV, 2018.
[26]	WEI S E, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA: IEEE, 2016.
[27]	YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA: AAAI, 2018.
[28]	MAO X X, WU S X, XIAN B L, et al. Adaptive graph convolution and LSTM action recognition based on skeleton[J]. Journal of East China University of Science and Technology, 2021, 48: 1-10.
[29]	刘海洲, 张敬宇. 基于大数据的城市轨道交通出行站外OD位置点识别方法研究[J]. 铁道运输与经济, 2022, 44(8): 115-122. LIU H Z, ZHANG J Y. Research on OD location point identification method outside urban rail transit travel stations based on big data[J]. Railway Transport and Economy, 2022, 44(8): 115-122. (in Chinese)
[30]	DEMOKRI D P, JOUDAKI S, KOLIVAND H. A new traffic sign recognition technique taking shuffled frog-leaping algorithm into account[J]. Wireless Personal Communications, 2022, 125(4): 11-17.