Journal of Computer Science and Technology ›› 2019, Vol. 34 ›› Issue (3): 522-536.doi: 10.1007/s11390-019-1924-x

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

Previous Articles     Next Articles

Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation

Shuai Li1,2, Member, IEEE, Zheng Fang1, Wen-Feng Song1, Ai-Min Hao1, Member, IEEE, Hong Qin3,*, Member, IEEE   

  1. 1 State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China;
    2 Beihang University Qingdao Research Institute, Qingdao 266000, China;
    3 Department of Computer Science, Stony Brook University, Stony Brook 11790, U.S.A.
  • Received:2018-12-30 Revised:2019-03-20 Online:2019-05-05 Published:2019-05-06
  • Contact: Hong Qin E-mail:qin@cs.stonybrook.edu
  • About author:Shuai Li received his Ph.D. degree in computer science from Beihang University, Beijing, in 2010. He is currently an associate professor at the State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, and Beihang Qingdao Research Institute, Qingdao. His research interests include computer graphics, pattern recognition, computer vision, and medical image processing.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61672077 and 61532002, the Applied Basic Research Program of Qingdao under Grant No. 161013xx, and the National Science Foundation of USA under Grant Nos. IIS-0949467, IIS-1047715, IIS-1715985, IIS61672149, and IIS-1049448.

For multi-person 2D pose estimation, current deep learning based methods have exhibited impressive performance, but the trade-offs among efficiency, robustness, and accuracy in the existing approaches remain unavoidable. In principle, bottom-up methods are superior to top-down methods in efficiency, but they perform worse in accuracy. To make full use of their respective advantages, in this paper we design a novel bidirectional optimization coupled lightweight network (BOCLN) architecture for efficient, robust, and general-purpose multi-person 2D (2-dimensional) pose estimation from natural images. With the BOCLN framework, the bottom-up network focuses on global features, while the top-down network places emphasis on detailed features. The entire framework shares global features along the bottom-up data stream, while the top-down data stream aims to accelerate the accurate pose estimation. In particular, to exploit the priors of human joints' relationship, we propose a probability limb heat map to represent the spatial context of the joints and guide the overall pose skeleton prediction, so that each person's pose estimation in cluttered scenes (involving crowd) could be as accurate and robust as possible. Therefore, benefiting from the novel BOCLN architecture, the time-consuming refinement procedure could be much simplified to an efficient lightweight network. Extensive experiments and evaluations on public benchmarks have confirmed that our new method is more efficient and robust, yet still attain competitive accuracy performance compared with the state-of-the-art methods. Our BOCLN shows even greater promise in online applications.

Key words: bidirectional optimization; computer vision; deep learning; probability limb heat map; 2D multi-person pose estimation;

[1] Wen Y, Gao L, Fu H, Zhang F, Xia S. Graph CNNs with motif and variable temporal block for skeleton-based action recognition. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 2019.
[2] Kikuchi T, Endo Y, Kanamori Y, Hashimoto T, Mitani J. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media, 2018, 4(1):43-54.
[3] Fan X, Zheng K, Lin Y, Wang S. Combining local appearance and holistic view:Dual-source deep neural networks for human pose estimation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1347-1355.
[4] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference, October 2016, pp.483-499.
[5] Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4724-4732.
[6] Chen Y, Shen C, Wei X S, Liu L, Yang J. Adversarial PoseNet:A structure-aware convolutional network for human pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1212-1221.
[7] Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P V, Schiele B. DeepCut:Joint subset partition and labeling for multi person pose estimation. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4929-4937.
[8] Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1302-1310.
[9] Newell A, Huang Z, Deng J. Associative embedding:Endto-end learning for joint detection and grouping. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.2274-2284.
[10] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2980-2988.
[11] Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K. Towards accurate multi-person pose estimation in the wild. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3711-3719.
[12] Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J. Cascaded pyramid network for multi-person pose estimation. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7103-7112.
[13] Papandreou G, Zhu T, Chen L C, Gidaris S, Tompson J, Murphy K. PersonLab:Person pose estimation and instance segmentation with a bottom-up, partbased, geometric embedding model. arXiv:1803.08225, 2018. https://arxiv.org/abs/1803.08225, January 2019.
[14] Kocabas M, Karagoz S, Akbas E. MultiPoseNet:Fast multi-person pose estimation using pose residual network. arXiv:1807.04067, 2018. https://arxiv.org/abs/1807.04067, January 2019.
[15] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2005, pp.886-893.
[16] Chen X, Yuille A L. Articulated pose estimation by a graphical model with image dependent pairwise relations. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1736-1744.
[17] Andriluka M, Roth S, Schiele B. Pictorial structures revisited:People detection and articulated pose estimation. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.1014-1021.
[18] Johnson S, Everingham M. Learning effective human pose estimation from inaccurate annotation. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1465-1472.
[19] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1385-1392.
[20] Dantone M, Gall J, Leistner C, Gool L V. Human pose estimation using body parts dependent joint regressors. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3041-3048.
[21] Gkioxari G, Arbelaez P, Bourdev L, Malik J. Articulated pose estimation using discriminative armlet classifiers. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3342-3349.
[22] Pishchulin L, Andriluka M, Gehler P, Schiele B. Poselet conditioned pictorial structures. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.588-595.
[23] Sapp B, Taskar B. MODEC:Multimodal decomposable models for human pose estimation. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3674-3681.
[24] Toshev A, Szegedy C. DeepPose:Human pose estimation via deep neural networks. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1653-1660.
[25] Zhang Z, Luo P, Loy C C, Tang X. Facial landmark detection by deep multi-task learning. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.94-108.
[26] Wang J, Zhang J, Luo C, Chen F. Joint head pose and facial landmark regression from depth images.Computational Visual Media, 2017, 3(3):229-241.
[27] Tompson J J, Jain A, LeCun Y, Bregler C. Joint training of a convolutional network and a graphical model for human pose estimation. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1799-1807.
[28] Chu X, Yang W, Ouyang W, Ma C, Yuille A L, Wang X. Multi-context attention for human pose estimation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5669-5678.
[29] Rogez G, Weinzaepfel P, Schmid C. LCR-Net:Localizationclassification-regression for human pose. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1216-1224.
[30] Fang H, Xie S, Tai Y W, Lu C. RMPE:Regional multiperson pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2353-2362.
[31] Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448.
[32] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.91-99.
[33] Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944.
[34] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO:Common objects in context. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.740-755.
[35] Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.3686-3693.
[36] Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In Proc. the 2017 Annual Conference on Neural Information Processing Systems Autodiff Workshop, December 2017.
[1] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[2] Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance [J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791.
[3] Lan Chen, Juntao Ye, Xiaopeng Zhang. Multi-Feature Super-Resolution Network for Cloth Wrinkle Synthesis [J]. Journal of Computer Science and Technology, 2021, 36(3): 478-493.
[4] Yu-Jie Yuan, Yukun Lai, Tong Wu, Lin Gao, Li-Gang Liu. A Revisit of Shape Editing Techniques: From the Geometric to the Neural Viewpoint [J]. Journal of Computer Science and Technology, 2021, 36(3): 520-554.
[5] Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning [J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663.
[6] Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning [J]. Journal of Computer Science and Technology, 2021, 36(2): 234-247.
[7] Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. Robust Needle Localization and Enhancement Algorithm for Ultrasound by Deep Learning and Beam Steering Methods [J]. Journal of Computer Science and Technology, 2021, 36(2): 334-346.
[8] Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology [J]. Journal of Computer Science and Technology, 2021, 36(2): 347-360.
[9] Nuo Qun, Hang Yan, Xi-Peng Qiu, Xuan-Jing Huang. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node [J]. Journal of Computer Science and Technology, 2020, 35(5): 1115-1126.
[10] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults [J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146.
[11] Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang. Lane Detection: A Survey with New Results [J]. Journal of Computer Science and Technology, 2020, 35(3): 493-505.
[12] Zheng Zeng, Lu Wang, Bei-Bei Wang, Chun-Meng Kang, Yan-Ning Xu. Denoising Stochastic Progressive Photon Mapping Renderings Using a Multi-Residual Network [J]. Journal of Computer Science and Technology, 2020, 35(3): 506-521.
[13] Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Ling-Jie Xu, Cong Liu, Hai-Feng Liu, Shan Tang, Allen Rush, Willian Chen, Shao-Li Liu, Yun-Ji Chen, Tian-Shi Chen. BENCHIP: Benchmarking Intelligence Processors [J]. , 2018, 33(1): 1-23.
[14] Fei Hu, Li Li, Zi-Li Zhang, Jing-Yuan Wang, Xiao-Fei Xu. Emphasizing Essential Words for Sentiment Classification Based on Recurrent Neural Networks [J]. , 2017, 32(4): 785-795.
[15] Wei Zhang, Chao-Wei Fang, Guan-Bin Li. Automatic Colorization with Improved Spatial Coherence and Boundary Localization [J]. , 2017, 32(3): 494-506.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved