|
计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (3): 522-536.doi: 10.1007/s11390-019-1924-x
所属专题: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia
• Special Section of CVM 2019 • 上一篇 下一篇
Shuai Li1,2, Member, IEEE, Zheng Fang1, Wen-Feng Song1, Ai-Min Hao1, Member, IEEE, Hong Qin3,*, Member, IEEE
Shuai Li1,2, Member, IEEE, Zheng Fang1, Wen-Feng Song1, Ai-Min Hao1, Member, IEEE, Hong Qin3,*, Member, IEEE
基于深度学习的方法近些年在多人姿态估计任务中展现出了良好的效果,然而在目前的处理方法中,准确性和时效性的权衡问题依旧没有被完美解决。原则上来讲,自底向上方法相对于自顶向下方法在效率上表现更加优越,但准确性却不如后者。为了充分发挥二者的优势,我们设计了一种双向特征共享的轻量级网络,用于自然场景下的二维多人姿态估计任务。在我们的框架中,自底向上网络关注于全局特征,自定向下网络注重于细节特征。在整个框架中,全局特征通过自底向上网络数据流共享给自顶向下网络从而实现快速准确地定位骨架关节点。并且,为了利用人体骨架关节点的先验关系,我们设计了一种肢体热力图来表示关节点之间的空间语义并引导骨架的预测。因此即便在混乱的复杂场景下,我们的方法依然能够准确、鲁棒地预测出结果。得益于我们的双向特征共享框架,耗时的结果精粹过程可以简化为使用一个高效的轻量级网络。在实验部分证明了我们的方法更加的高效和鲁棒,并且达到了与当前最优结果相当的准确率。我们的双向特征共享的轻量级网络在实时项目中展现出更好的性能。
[1] Wen Y, Gao L, Fu H, Zhang F, Xia S. Graph CNNs with motif and variable temporal block for skeleton-based action recognition. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 2019. [2] Kikuchi T, Endo Y, Kanamori Y, Hashimoto T, Mitani J. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media, 2018, 4(1):43-54. [3] Fan X, Zheng K, Lin Y, Wang S. Combining local appearance and holistic view:Dual-source deep neural networks for human pose estimation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1347-1355. [4] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference, October 2016, pp.483-499. [5] Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4724-4732. [6] Chen Y, Shen C, Wei X S, Liu L, Yang J. Adversarial PoseNet:A structure-aware convolutional network for human pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1212-1221. [7] Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P V, Schiele B. DeepCut:Joint subset partition and labeling for multi person pose estimation. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4929-4937. [8] Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1302-1310. [9] Newell A, Huang Z, Deng J. Associative embedding:Endto-end learning for joint detection and grouping. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.2274-2284. [10] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2980-2988. [11] Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K. Towards accurate multi-person pose estimation in the wild. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3711-3719. [12] Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J. Cascaded pyramid network for multi-person pose estimation. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7103-7112. [13] Papandreou G, Zhu T, Chen L C, Gidaris S, Tompson J, Murphy K. PersonLab:Person pose estimation and instance segmentation with a bottom-up, partbased, geometric embedding model. arXiv:1803.08225, 2018. https://arxiv.org/abs/1803.08225, January 2019. [14] Kocabas M, Karagoz S, Akbas E. MultiPoseNet:Fast multi-person pose estimation using pose residual network. arXiv:1807.04067, 2018. https://arxiv.org/abs/1807.04067, January 2019. [15] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2005, pp.886-893. [16] Chen X, Yuille A L. Articulated pose estimation by a graphical model with image dependent pairwise relations. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1736-1744. [17] Andriluka M, Roth S, Schiele B. Pictorial structures revisited:People detection and articulated pose estimation. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.1014-1021. [18] Johnson S, Everingham M. Learning effective human pose estimation from inaccurate annotation. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1465-1472. [19] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.1385-1392. [20] Dantone M, Gall J, Leistner C, Gool L V. Human pose estimation using body parts dependent joint regressors. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3041-3048. [21] Gkioxari G, Arbelaez P, Bourdev L, Malik J. Articulated pose estimation using discriminative armlet classifiers. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3342-3349. [22] Pishchulin L, Andriluka M, Gehler P, Schiele B. Poselet conditioned pictorial structures. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.588-595. [23] Sapp B, Taskar B. MODEC:Multimodal decomposable models for human pose estimation. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3674-3681. [24] Toshev A, Szegedy C. DeepPose:Human pose estimation via deep neural networks. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1653-1660. [25] Zhang Z, Luo P, Loy C C, Tang X. Facial landmark detection by deep multi-task learning. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.94-108. [26] Wang J, Zhang J, Luo C, Chen F. Joint head pose and facial landmark regression from depth images.Computational Visual Media, 2017, 3(3):229-241. [27] Tompson J J, Jain A, LeCun Y, Bregler C. Joint training of a convolutional network and a graphical model for human pose estimation. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.1799-1807. [28] Chu X, Yang W, Ouyang W, Ma C, Yuille A L, Wang X. Multi-context attention for human pose estimation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5669-5678. [29] Rogez G, Weinzaepfel P, Schmid C. LCR-Net:Localizationclassification-regression for human pose. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1216-1224. [30] Fang H, Xie S, Tai Y W, Lu C. RMPE:Regional multiperson pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2353-2362. [31] Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448. [32] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.91-99. [33] Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944. [34] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO:Common objects in context. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.740-755. [35] Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.3686-3693. [36] Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In Proc. the 2017 Annual Conference on Neural Information Processing Systems Autodiff Workshop, December 2017. |
[1] | 张鑫, 陆思源, 王水花, 余翔, 王甦菁, 姚仑, 潘毅, 张煜东. 通过新型深度学习架构诊断COVID-19肺炎[J]. 计算机科学技术学报, 2022, 37(2): 330-343. |
[2] | Songjie Niu, Shimin Chen. TransGPerf:利用迁移学习建模分布式图计算性能[J]. 计算机科学技术学报, 2021, 36(4): 778-791. |
[3] | Lan Chen, Juntao Ye, Xiaopeng Zhang. 基于多特征超分网络的布料褶皱合成[J]. 计算机科学技术学报, 2021, 36(3): 478-493. |
[4] | Yu-Jie Yuan, Yukun Lai, Tong Wu, Lin Gao, Li-Gang Liu. 回顾形状编辑技术:从几何角度到神经网络方法[J]. 计算机科学技术学报, 2021, 36(3): 520-554. |
[5] | Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. 基于深度学习的文本摘要研究综述[J]. 计算机科学技术学报, 2021, 36(3): 633-663. |
[6] | Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. 基于迁移学习与深度学习的人类血液分泌蛋白预测框架[J]. 计算机科学技术学报, 2021, 36(2): 234-247. |
[7] | Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. 基于深度学习与波束偏转的穿刺针定位与增强算法[J]. 计算机科学技术学报, 2021, 36(2): 334-346. |
[8] | Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain:基于深度学习技术的宫颈癌筛查系统[J]. 计算机科学技术学报, 2021, 36(2): 347-360. |
[9] | Andrea Caroppo, Alessandro Leone, Pietro Siciliano. 用于老年人面部表情识别的深度学习模型和传统机器学习方法的对比研究[J]. 计算机科学技术学报, 2020, 35(5): 1127-1146. |
[10] | 梁盾, 郭元晨, 张少魁, 穆太江, 黄晓蕾. 车道检测-新结果和调查研究[J]. 计算机科学技术学报, 2020, 35(3): 493-505. |
[11] | Zheng Zeng, Lu Wang, Bei-Bei Wang, Chun-Meng Kang, Yan-Ning Xu. 一种基于多重残差网络的随机渐进式光子映射的降噪方法[J]. 计算机科学技术学报, 2020, 35(3): 506-521. |
[12] | Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Ling-. 智能处理器的评测基准[J]. , 2018, 33(1): 1-23. |
[13] | Fei Hu, Li Li, Zi-Li Zhang, Jing-Yuan Wang, Xiao-Fei Xu. 基于RNN的文本关键字强调模型用于情感分类[J]. , 2017, 32(4): 785-795. |
[14] | Wei Zhang, Chao-Wei Fang, Guan-Bin Li. 提升空间一致性与边缘定位的图像自动上色方法[J]. , 2017, 32(3): 494-506. |
[15] | Hui-Ying Lan, Lin-Yang Wu, Xiao Zhang, Jin-Hua Tao, Xun-Yu Chen, Bing-Rui Wang, Yu-Qing Wang, Qi Guo, Yun-Ji Chen. 基于深度学习处理器的库设计与实现[J]. , 2017, 32(2): 286-296. |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |