计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (3): 522-536.doi: 10.1007/s11390-019-1924-x

所属专题: Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• Special Section of CVM 2019 • 上一篇    下一篇

基于双向特征共享网络的多人姿态估计方法研究

Shuai Li1,2, Member, IEEE, Zheng Fang1, Wen-Feng Song1, Ai-Min Hao1, Member, IEEE, Hong Qin3,*, Member, IEEE   

  1. 1 State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China;
    2 Beihang University Qingdao Research Institute, Qingdao 266000, China;
    3 Department of Computer Science, Stony Brook University, Stony Brook 11790, U.S.A.
  • 收稿日期:2018-12-30 修回日期:2019-03-20 出版日期:2019-05-05 发布日期:2019-05-06
  • 通讯作者: Hong Qin E-mail:qin@cs.stonybrook.edu
  • 作者简介:Shuai Li received his Ph.D. degree in computer science from Beihang University, Beijing, in 2010. He is currently an associate professor at the State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, and Beihang Qingdao Research Institute, Qingdao. His research interests include computer graphics, pattern recognition, computer vision, and medical image processing.
  • 基金资助:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61672077 and 61532002, the Applied Basic Research Program of Qingdao under Grant No. 161013xx, and the National Science Foundation of USA under Grant Nos. IIS-0949467, IIS-1047715, IIS-1715985, IIS61672149, and IIS-1049448.

Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation

Shuai Li1,2, Member, IEEE, Zheng Fang1, Wen-Feng Song1, Ai-Min Hao1, Member, IEEE, Hong Qin3,*, Member, IEEE   

  1. 1 State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China;
    2 Beihang University Qingdao Research Institute, Qingdao 266000, China;
    3 Department of Computer Science, Stony Brook University, Stony Brook 11790, U.S.A.
  • Received:2018-12-30 Revised:2019-03-20 Online:2019-05-05 Published:2019-05-06
  • Contact: Hong Qin E-mail:qin@cs.stonybrook.edu
  • About author:Shuai Li received his Ph.D. degree in computer science from Beihang University, Beijing, in 2010. He is currently an associate professor at the State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, and Beihang Qingdao Research Institute, Qingdao. His research interests include computer graphics, pattern recognition, computer vision, and medical image processing.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61672077 and 61532002, the Applied Basic Research Program of Qingdao under Grant No. 161013xx, and the National Science Foundation of USA under Grant Nos. IIS-0949467, IIS-1047715, IIS-1715985, IIS61672149, and IIS-1049448.

基于深度学习的方法近些年在多人姿态估计任务中展现出了良好的效果,然而在目前的处理方法中,准确性和时效性的权衡问题依旧没有被完美解决。原则上来讲,自底向上方法相对于自顶向下方法在效率上表现更加优越,但准确性却不如后者。为了充分发挥二者的优势,我们设计了一种双向特征共享的轻量级网络,用于自然场景下的二维多人姿态估计任务。在我们的框架中,自底向上网络关注于全局特征,自定向下网络注重于细节特征。在整个框架中,全局特征通过自底向上网络数据流共享给自顶向下网络从而实现快速准确地定位骨架关节点。并且,为了利用人体骨架关节点的先验关系,我们设计了一种肢体热力图来表示关节点之间的空间语义并引导骨架的预测。因此即便在混乱的复杂场景下,我们的方法依然能够准确、鲁棒地预测出结果。得益于我们的双向特征共享框架,耗时的结果精粹过程可以简化为使用一个高效的轻量级网络。在实验部分证明了我们的方法更加的高效和鲁棒,并且达到了与当前最优结果相当的准确率。我们的双向特征共享的轻量级网络在实时项目中展现出更好的性能。

关键词: 双向优化, 计算机视觉, 深度学习, 肢体热力概率图, 二维多人姿态估计

Abstract: For multi-person 2D pose estimation, current deep learning based methods have exhibited impressive performance, but the trade-offs among efficiency, robustness, and accuracy in the existing approaches remain unavoidable. In principle, bottom-up methods are superior to top-down methods in efficiency, but they perform worse in accuracy. To make full use of their respective advantages, in this paper we design a novel bidirectional optimization coupled lightweight network (BOCLN) architecture for efficient, robust, and general-purpose multi-person 2D (2-dimensional) pose estimation from natural images. With the BOCLN framework, the bottom-up network focuses on global features, while the top-down network places emphasis on detailed features. The entire framework shares global features along the bottom-up data stream, while the top-down data stream aims to accelerate the accurate pose estimation. In particular, to exploit the priors of human joints' relationship, we propose a probability limb heat map to represent the spatial context of the joints and guide the overall pose skeleton prediction, so that each person's pose estimation in cluttered scenes (involving crowd) could be as accurate and robust as possible. Therefore, benefiting from the novel BOCLN architecture, the time-consuming refinement procedure could be much simplified to an efficient lightweight network. Extensive experiments and evaluations on public benchmarks have confirmed that our new method is more efficient and robust, yet still attain competitive accuracy performance compared with the state-of-the-art methods. Our BOCLN shows even greater promise in online applications.

Key words: bidirectional optimization, computer vision, deep learning, probability limb heat map, 2D multi-person pose estimation

[1] Wen Y, Gao L, Fu H, Zhang F, Xia S. Graph CNNs with motif and variable temporal block for skeleton-based action recognition. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 2019.
[2] Kikuchi T, Endo Y, Kanamori Y, Hashimoto T, Mitani J. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media, 2018, 4(1):43-54.
[3] Fan X, Zheng K, Lin Y, Wang S. Combining local appearance and holistic view:Dual-source deep neural networks for human pose estimation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1347-1355.
[4] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference, October 2016, pp.483-499.
[5] Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4724-4732.
[6] Chen Y, Shen C, Wei X S, Liu L, Yang J. Adversarial PoseNet:A structure-aware convolutional network for human pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1212-1221.
[7] Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P V, Schiele B. DeepCut:Joint subset partition and labeling for multi person pose estimation. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4929-4937.
[8] Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1302-1310.
[9] Newell A, Huang Z, Deng J. Associative embedding:Endto-end learning for joint detection and grouping. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.2274-2284.
[10] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2980-2988.
[11] Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K. Towards accurate multi-person pose estimation in the wild. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3711-3719.
[12] Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J. Cascaded pyramid network for multi-person pose estimation. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7103-7112.
[13] Papandreou G, Zhu T, Chen L C, Gidaris S, Tompson J, Murphy K. PersonLab:Person pose estimation and instance segmentation with a bottom-up, partbased, geometric embedding model. arXiv:1803.08225, 2018. https://arxiv.org/abs/1803.08225, January 2019.
[14] Kocabas M, Karagoz S, Akbas E. MultiPoseNet:Fast multi-person pose estimation using pose residual network. arXiv:1807.04067, 2018. https://arxiv.org/abs/1807.04067, January 2019.
[15] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2005, pp.886-893.
[16] Chen X, Yuille A L. Articulated pose estimation by a graphical model with image dependent pairwise relations. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1736-1744.
[17] Andriluka M, Roth S, Schiele B. Pictorial structures revisited:People detection and articulated pose estimation. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.1014-1021.
[18] Johnson S, Everingham M. Learning effective human pose estimation from inaccurate annotation. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1465-1472.
[19] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1385-1392.
[20] Dantone M, Gall J, Leistner C, Gool L V. Human pose estimation using body parts dependent joint regressors. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3041-3048.
[21] Gkioxari G, Arbelaez P, Bourdev L, Malik J. Articulated pose estimation using discriminative armlet classifiers. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3342-3349.
[22] Pishchulin L, Andriluka M, Gehler P, Schiele B. Poselet conditioned pictorial structures. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.588-595.
[23] Sapp B, Taskar B. MODEC:Multimodal decomposable models for human pose estimation. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3674-3681.
[24] Toshev A, Szegedy C. DeepPose:Human pose estimation via deep neural networks. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1653-1660.
[25] Zhang Z, Luo P, Loy C C, Tang X. Facial landmark detection by deep multi-task learning. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.94-108.
[26] Wang J, Zhang J, Luo C, Chen F. Joint head pose and facial landmark regression from depth images.Computational Visual Media, 2017, 3(3):229-241.
[27] Tompson J J, Jain A, LeCun Y, Bregler C. Joint training of a convolutional network and a graphical model for human pose estimation. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1799-1807.
[28] Chu X, Yang W, Ouyang W, Ma C, Yuille A L, Wang X. Multi-context attention for human pose estimation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5669-5678.
[29] Rogez G, Weinzaepfel P, Schmid C. LCR-Net:Localizationclassification-regression for human pose. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1216-1224.
[30] Fang H, Xie S, Tai Y W, Lu C. RMPE:Regional multiperson pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2353-2362.
[31] Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448.
[32] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.91-99.
[33] Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944.
[34] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO:Common objects in context. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.740-755.
[35] Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.3686-3693.
[36] Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In Proc. the 2017 Annual Conference on Neural Information Processing Systems Autodiff Workshop, December 2017.
[1] 张鑫, 陆思源, 王水花, 余翔, 王甦菁, 姚仑, 潘毅, 张煜东. 通过新型深度学习架构诊断COVID-19肺炎[J]. 计算机科学技术学报, 2022, 37(2): 330-343.
[2] Songjie Niu, Shimin Chen. TransGPerf:利用迁移学习建模分布式图计算性能[J]. 计算机科学技术学报, 2021, 36(4): 778-791.
[3] Lan Chen, Juntao Ye, Xiaopeng Zhang. 基于多特征超分网络的布料褶皱合成[J]. 计算机科学技术学报, 2021, 36(3): 478-493.
[4] Yu-Jie Yuan, Yukun Lai, Tong Wu, Lin Gao, Li-Gang Liu. 回顾形状编辑技术:从几何角度到神经网络方法[J]. 计算机科学技术学报, 2021, 36(3): 520-554.
[5] Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. 基于深度学习的文本摘要研究综述[J]. 计算机科学技术学报, 2021, 36(3): 633-663.
[6] Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. 基于迁移学习与深度学习的人类血液分泌蛋白预测框架[J]. 计算机科学技术学报, 2021, 36(2): 234-247.
[7] Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. 基于深度学习与波束偏转的穿刺针定位与增强算法[J]. 计算机科学技术学报, 2021, 36(2): 334-346.
[8] Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain:基于深度学习技术的宫颈癌筛查系统[J]. 计算机科学技术学报, 2021, 36(2): 347-360.
[9] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. 用于老年人面部表情识别的深度学习模型和传统机器学习方法的对比研究[J]. 计算机科学技术学报, 2020, 35(5): 1127-1146.
[10] 梁盾, 郭元晨, 张少魁, 穆太江, 黄晓蕾. 车道检测-新结果和调查研究[J]. 计算机科学技术学报, 2020, 35(3): 493-505.
[11] Zheng Zeng, Lu Wang, Bei-Bei Wang, Chun-Meng Kang, Yan-Ning Xu. 一种基于多重残差网络的随机渐进式光子映射的降噪方法[J]. 计算机科学技术学报, 2020, 35(3): 506-521.
[12] Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Ling-. 智能处理器的评测基准[J]. , 2018, 33(1): 1-23.
[13] Fei Hu, Li Li, Zi-Li Zhang, Jing-Yuan Wang, Xiao-Fei Xu. 基于RNN的文本关键字强调模型用于情感分类[J]. , 2017, 32(4): 785-795.
[14] Wei Zhang, Chao-Wei Fang, Guan-Bin Li. 提升空间一致性与边缘定位的图像自动上色方法[J]. , 2017, 32(3): 494-506.
[15] Hui-Ying Lan, Lin-Yang Wu, Xiao Zhang, Jin-Hua Tao, Xun-Yu Chen, Bing-Rui Wang, Yu-Qing Wang, Qi Guo, Yun-Ji Chen. 基于深度学习处理器的库设计与实现[J]. , 2017, 32(2): 286-296.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: