Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes

WANG Liyuan; YAO Yuntao; JIA Yang; XIAO Jinsheng; LI Bijun

doi:10.3963/j.jssn.1674-4861.2023.06.012

Volume 41 Issue 6

Dec. 2023

Turn off MathJax

Article Contents

Article Navigation > Journal of Transport Information and Safety > 2023 > 41(6): 107-113

WANG Liyuan, YAO Yuntao, JIA Yang, XIAO Jinsheng, LI Bijun. Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes[J]. Journal of Transport Information and Safety, 2023, 41(6): 107-113. doi: 10.3963/j.jssn.1674-4861.2023.06.012

Citation:

WANG Liyuan, YAO Yuntao, JIA Yang, XIAO Jinsheng, LI Bijun. Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes[J]. Journal of Transport Information and Safety, 2023, 41(6): 107-113. doi: 10.3963/j.jssn.1674-4861.2023.06.012

Citation:

PDF( 2775 KB)

Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes

doi: 10.3963/j.jssn.1674-4861.2023.06.012

1.
CCCC Second Highway Consultants Co., LTD, Wuhan 430056, China
2.
School of Electronic Information, Wuhan University, Wuhan 430072, China
3.
Sichuan Highway Planning, Survey, Design and Research Institute Co., LTD., Chengdu 610041, China
4.
State Key Laboratory of Information Engineering in Surveying, mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

Received Date: 2023-08-18
Available Online: 2024-04-03

Abstract

Abstract

Crowd count is an important task in computer vision. Crowd count task in traffic scenes plays a significant role in maintaining public traffic safety and achieving traffic intelligence. However, crowd count in public traffic scenes faces difficulties due to pedestrian occlusion and complex background. In order to achieve high accuracy crowd count, an attention-based crowd density estimation network is proposed. The network consists of three parts: a feature extraction module is designed to generate multi-scale feature maps, which can enhance the feature representation capability and improve the robustness to pedestrian scale variation of the network; an attention module is designed to suppress the background noise response and enhance the crowd feature response, generate the probability distribution of the crowd region in the feature map, which can enhance the ability of the network to distinguish the crowd region from the background region; a density estimation module is designed that guides the network to regress a high-resolution crowd density map under the constraint of attention mechanism, which can improve the sensitivity of the network to crowd regions. In addition, a background-aware structure loss function is designed to reduce the model false recognition rate and improve the model counting accuracy; meanwhile, a multi-level super-vision mechanism is adopted to guide the network for learning, which can help gradient back-propagation and reduce over-fitting, further improving the network's crowd count accuracy. Experiments are carried out on public dataset ShanghaiTech. Compared with the state-of-the-art algorithms, on ShanghaiTechA and ShanghaiTechB datasets, the mean absolute error (MAE) improves by 2.4% and 1.5%, and the mean square error (MSE) improves by 3.3% and 0.9%, respectively, which demonstrates the superior accuracy and robustness of the proposed algorithm in both crowded and sparse scenes. Experiments are also conducted on real scene dataset with MAE=7.7 and MSE=12.6, which proves the good applicability of the proposed algorithm.
- traffic safety,
- crowd count,
- attention mechanism,
- background-aware structure loss algorithm,
- multi-level supervision

FullText(HTML)

References(22)

References

[1]	张宇倩, 李国辉, 雷军, 等. FF-CAM: 基于通道注意机制前后端融合的人群计数[J]. 计算机学报, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm ZHANG Y Q, LI G H, LEI J, et al. FF-CAM: crowd counting based on frontend-backend fusion through channel-attention mechanism[J]. Chinese Journal of Computers, 2021, 44 (2): 304-317. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm
[2]	杜培德, 严华. 基于多尺度空间注意力特征融合的人群计数网络[J]. 计算机应用, 2021, 41(2): 537-543. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202102035.htm DU P D, YAN H. Crowd counting network based on multi-scale spatial attention feature fusion[J]. Computer Applications, 2021, 41(2): 537-543. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202102035.htm
[3]	WANG Z, CHEN J, HOI S. Deep learning for image super-resolution: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3365-3387. doi: 10.1109/TPAMI.2020.2982166
[4]	LEIBE B, SEEMANN E, SCHIELE B. Pedestrian detection in crowded scenes[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR' 05), San Diego, CA, USA. IEEE, 2005.
[5]	LI M, ZJANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection[C]. 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA. IEEE, 2008.
[6]	CHEN K, LOY C C, GONG S, et al. Feature mining for localised crowd counting[C]. British Machine Vision Conference, Guildford, Surrey, UK. 2012, 1(2): 3.
[7]	LOWE D G. Object recognition from local scale-invariant features[C]. Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece. IEEE, 1999.
[8]	OJALA T, PIETIKAINEN M, MAENPAA T. Gray-scale and rotation invariant texture classification with local binary patterns[C]. Computer Vision-ECCV 2000: 6th European Conference on Computer Vision Dublin, Ireland. Springer, 2000.
[9]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE computer society conference on computer vision and pattern recognition(CVPR'05), San Diego, CA, USA. IEEE, 2005.
[10]	PARAGIOS N, RAMESH V. A MRF-based approach for real-time subway monitoring[C]. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001. IEEE, 2001.
[11]	TIAN Y, SIGAL L, BADINO H, et al. Latent gaussian mixture regression for human pose estimation[C]. Asian Conference on Computer Vision, Berlin, Heidelberg: Springer, 2010.
[12]	LEMPITSKY V, ZISSERMAN A. Learning to count objects in images[OL]. (2010-12-06)[2023-05-15]. https://www.robots.ox.ac.uk/~vgg/publications/2010/Lempitsky10b/lempitsky10b.pdf
[13]	PHAM V Q, KOZAKAYA T, YAMAGUCHI O, et al. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation[C]. IEEE International Conference on Computer Vision, Santiago, Chile: IEEE, 2015.
[14]	肖进胜, 申梦瑶, 江明俊, 等. 融合包注意力机制的监控视频异常行为检测[J]. 自动化学报, 2022, 48(12): 2953-2961. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202212007.htm XIAO J S, SHEN M Y, JIANG M J, et al. Abnormal behavior detection algorithm with video-bag attention mechanism in surveillance video[J]. Acta Automatica Sinica, 2022, 48 (12): 2953-2961. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202212007.htm
[15]	ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]. IEEE conference on computer vision and pattern recognition, Las Vegas, USA: IEEE, 2016.
[16]	LI Y, ZHANG X, CHEN D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA: IEEE, 2018.
[17]	LIU W, SALZMANN M, FUA P. Context-aware crowd counting[C]. Conference on Computer Vision and Pattern Recognition, Long Beach, USA: IEEE, 2019.
[18]	RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]. Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany: Springer, 2015.
[19]	RONG L, LI C. Coarse- and fine-grained attention network with background-aware loss for crowd density map estimation[C]. Winter Conference on Applications of Computer Vision(WACV), Waikoloa, USA: IEEE, 2021.
[20]	IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]. European Conference on Computer Vision (ECCV), Munich, Germany: IEEE, 2018.
[21]	XIONG H, LU H, LIU C, et al. From open set to closed set: counting objects by spatial divide-and-conquer[C]. International Conference on Computer Vision(ICCV), Seoul, Korea(South): IEEE, 2019.
[22]	MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision[C]. International Conference on Computer Vision (ICCV), Seoul, Korea (South): IEEE, 2019. LIU T L, ZHANG C, WANG T G, et al. Effects of friends'information interaction on travel decisions[J]. Journal of Transportation Systems Engineering and Information Technology, 2013, 13(6): 86-93. (in Chinese)

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(7) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (237) PDF downloads(18)

Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes

doi: 10.3963/j.jssn.1674-4861.2023.06.012

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes

doi: 10.3963/j.jssn.1674-4861.2023.06.012

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content