Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (3): 626-640.doi: 10.1007/s11390-022-2204-8

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Special Section of CVM 2022 • Previous Articles     Next Articles

CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking

Xin Feng (冯欣), Senior Member, CCF, Member, IEEE, Hao-Ming Wu (吴浩铭), Yi-Hao Yin (殷一皓), and Li-Bin Lan (兰利彬), Member, CCF        

  1. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Received:2022-02-03 Revised:2022-04-24 Accepted:2022-05-06 Online:2022-05-30 Published:2022-05-30
  • Contact: Xin Feng E-mail:xfeng@cqut.edu.cn
  • About author:Xin Feng received her B.S. degree in computer science and technology from Chongqing University, Chongqing, in 2004. She got her Ph.D. degree in computer applications from Chongqing University, Chongqing, in 2011. She is currently an associate professor of Chongqing University of Technology, Chongqing. She studied at New York University, New York, as a postdoctor from 2014 to 2016. Her research falls in the area of computer vision, image and video processing.
  • Supported by:
    This work is partially supported by Humanities and Social Sciences of Chinese Ministry of Education Planning under Grant No. 17YJCZH043, the Key Project of Chongqing Technology Innovation and Application Development under Grant No. cstc2021jscx-dxwtBX0018, and the Scientific Research Foundation of Chongqing University of Technology under Grant No. 0103210650.

Most current online multi-object tracking (MOT) methods include two steps: object detection and data association, where the data association step relies on both object feature extraction and affinity computation. This often leads to additional computation cost, and degrades the efficiency of MOT methods. In this paper, we combine the object detection and data association module in a unified framework, while getting rid of the extra feature extraction process, to achieve a better speed-accuracy trade-off for MOT. Considering that a pedestrian is the most common object category in real-world scenes and has particularity characteristics in objects relationship and motion pattern, we present a novel yet efficient one-stage pedestrian detection and tracking method, named CGTracker. In particular, CGTracker detects the pedestrian target as the center point of the object, and directly extracts the object features from the feature representation of the object center point, which is used to predict the axis-aligned bounding box. Meanwhile, the detected pedestrians are constructed as an object graph to facilitate the multi-object association process, where the semantic features, displacement information and relative position relationship of the targets between two adjacent frames are used to perform the reliable online tracking. CGTracker achieves the multiple object tracking accuracy (MOTA) of 69.3% and 65.3% at 9 FPS on MOT17 and MOT20, respectively. Extensive experimental results under widely-used evaluation metrics demonstrate that our method is one of the best techniques on the leader board for the MOT17 and MOT20 challenges at the time of submission of this work.

Key words: pedestrian detection and tracking; object center; object graph;

[1] Kim C, Li F, Rehg J M. Multi-object tracking with neural gating using bilinear LSTM. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.208-224. DOI: 10.1007/978-3-030-01237-3.

[2] Bewley A, Ge Z, Ott L, Ramos F, Upcroft B. Simple online and realtime tracking. In Proc. the 2016 IEEE International Conference on Image Processing, September 2016, pp.3464-3468. DOI: 10.1109/ICIP.2016.7533003.

[3] Tang S, Andriluka M, Andres B, Schiele B. Multiple people tracking by lifted multicut and person re-identification. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3701-3710. DOI: 10.1109/CVPR.2017.394.

[4] Possegger H, Mauthner T, Roth P M, Bischof H. Occlusion geodesics for online multi-object tracking. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1306-1313. DOI: 10.1109/CVPR.2014.170.

[5] He A, Luo C, Tian X, Zeng W. A twofold Siamese network for real-time object tracking. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.4834-4843. DOI: 10.1109/CVPR.2018.00508.

[6] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, June 2016, 39: 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.

[7] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6517-6525. DOI: 10.1109/CVPR.2017.690.

[8] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv:1804.02767, 2018. https://arxiv.org/ abs/1804.02767, Jan. 2022.

[9] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934, 2020. https://arxiv.org/abs/2004.10934, April 2022.

[10] Rosebrock A. Intersection over Union (IoU) for object detection. https://pyimagesearch.com/2016/11/07/intersectionover-union-iou-for-object-detection/, July 2021.

[11] Feng X, Xue Y, Wang Y. An object based graph representation for video comparison. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.2548-2552. DOI: 10.1109/ICIP.2017.8296742.

[12] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.3645-3649. DOI: 10.1109/ICIP.2017.8296962.

[13] Yu F, Li W, Li Q, Liu Y, Shi X, Yan J. POI: Multiple object tracking with high performance detection and appearance feature. In Proc. the 14th European Conference on Computer Vision Workshops, October 2016, pp.36-42. DOI: 10.1007/978-3-319-48881-3.

[14] Sun S, Akhtar N, Song H, Mian A, Shah M. Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(1): 104-119. DOI: 10.1109/TPAMI.2019.2929520.

[15] Wang Z, Zheng L, Liu Y, Li Y, Wang S. Towards real-time multi-object tracking. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.107-122. DOI: 10.1007/978-3-030-58621-8.

[16] Lu Z, Rathod V, Votel R, Huang J. RetinaTrack: Online single stage joint detection and tracking. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.14656-14666. DOI: 10.1109/CVPR42600.2020.01468.

[17] Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M H. Online multi-object tracking with dual matching attention networks. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.379-396. DOI: 10.1007/978-3-030-01228-1.

[18] Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Fu Y. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.145-161. DOI: 10.1007/978-3-030-58548-8.

[19] Zhou X, Koltun V, Krähenbühl P. Tracking objects as points. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.474-490. DOI: 10.1007/978-3-030-58548-8.

[20] Zhang Y, Wang C, Wang X, Zeng W, Liu W. FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 2021, 129(11): 3069-3087. DOI: 10.1007/s11263-021-01513-4.

[21] Zhou X, Wang D, Krähenbühl P. Objects as points. arXiv:1904.07850, 2019. https://arxiv.org/abs/1904.07850, April 2022.

[22] Yu F, Wang D, Shelhamer E, Darrell T. Deep layer aggregation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.2403-2412. DOI: 10.1109/CVPR.2018.00255.

[23] Wang X, Liu Z. Salient object detection by optimizing robust background detection. In Proc. the 18th IEEE International Conference on Communication Technology, October 2018, pp.1164-1168. DOI: 10.1109/ICCT.2018.8600184.

[24] Law H, Deng J. CornerNet: Detecting objects as paired keypoints. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.765-781. DOI: 10.1007/978-3-030-01264-9.

[25] Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.9626-9635. DOI: 10.1109/ICCV.2019.00972.

[26] Neubeck A, van Gool L. Efficient non-maximum suppression. In Proc. the 18th International Conference on Pattern Recognition, August 2006, pp.850-855. DOI: 10.1109/ICPR.2006.479.

[27] Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831, 2016. https://arxiv.org/abs/1603.00831, Jan. 2022.

[28] Dendorfer P, Rezatofighi H, Milan A et al. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003, 2020. https://arxiv.org/abs/ 2003.09003, March 2022.

[29] Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645. DOI: 10.1109/TPAMI.2009.167.

[30] Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2129-2137. DOI: 10.1109/CVPR.2016.234.

[31] Bernardin K, Stiefelhagen R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008, 2008: Article No. 1. DOI: 10.1155/2008/246309.

[32] Luiten J, Ošep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B. HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, 2021, 129(2): 548-578. DOI: 10.1007/s11263-020-01375-2.

[33] Paszke A, Gross S, Chintala S et al. Automatic differentiation in PyTorch. In Proc. the 31st Conference on Neural Information Processing Systems Workshop, Dec. 2017.

[34] Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J. CrowdHuman: A benchmark for detecting human in a crowd. arXiv:1805.00123, 2018. https://arxiv.org/abs/ 1805.00123, Jan. 2022.

[35] Zhang S, Xie Y, Wan J, Xia H, Li S Z, Guo G. WiderPerson: A diverse dataset for dense pedestrian detection in the wild. IEEE Transactions on Multimedia, 2019, 22(2): 380-393. DOI: 10.1109/TMM.2019.2929005.

[36] Zhang S, Benenson R, Schiele B. CityPersons: A diverse dataset for pedestrian detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, June 2017, pp.4457-4465. DOI: 10.1109/CVPR.2017.474.

[37] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, April 2022.

[38] Pang B, Li Y, Zhang Y, Li M, Lu C. TubeTK: Adopting tubes to track multi-object in a one-step training model. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6307-6317. DOI: 10.1109/CVPR42600.2020.00634.

[39] Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, 2021, 7(9): 7892-7902. DOI: 10.1109/JIOT.2020.2996609.

[40] Li W, Xiong Y, Yang S, Xu M, Wang Y, Xia W. Semi-TCL: Semi-supervised track contrastive representation learning. arXiv:2107.02396, 2021. https://arxiv.org/abs/2107.02396, Jan. 2022.

[1] Nan Wang, Hai-Zhou Ai, and Feng Tang. Who Blocks Who: Simultaneous Segmentation of Occluded Objects [J]. , 2013, 28(5): 890-906.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] Zhong Renbao; Xing Lin; Ren Zhaoyang;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved