›› 2017,Vol. 32 ›› Issue (6): 1172-1185.doi: 10.1007/s11390-017-1792-1

所属专题: Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

使用一种基于Depth-Patch的深度神经网络的非正面人脸表情识别

Nai-Ming Yao1,2, Hui Chen1,2,*, Member, CCF, Qing-Pei Guo1,2, Hong-An Wang1,2,3, Member, CCF, IEEE   

  1. 1 Beijing Key Laboratory of Human-Computer Interaction, Institute of Software, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • 收稿日期:2017-06-20 修回日期:2017-09-27 出版日期:2017-11-05 发布日期:2017-11-05
  • 通讯作者: Hui Chen E-mail:chenhui@iscas.ac.cn
  • 作者简介:Nai-Ming Yao is a Ph.D.candidate at Institute of Software,Chinese Academy of Sciences,Beijing,and University of Chinese Academy of Sciences,Beijing.His research interests include human-computer interaction,affective computing,machine learning,and computer vision.
  • 基金资助:

    This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB1001405, and the National Natural Science Foundation of China under Grant Nos. 61232013, 61422212, and 61661146002.

Non-Frontal Facial Expression Recognition Using a Depth-Patch Based Deep Neural Network

Nai-Ming Yao1,2, Hui Chen1,2,*, Member, CCF, Qing-Pei Guo1,2, Hong-An Wang1,2,3, Member, CCF, IEEE   

  1. 1 Beijing Key Laboratory of Human-Computer Interaction, Institute of Software, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2017-06-20 Revised:2017-09-27 Online:2017-11-05 Published:2017-11-05
  • Contact: Hui Chen E-mail:chenhui@iscas.ac.cn
  • About author:Nai-Ming Yao is a Ph.D.candidate at Institute of Software,Chinese Academy of Sciences,Beijing,and University of Chinese Academy of Sciences,Beijing.His research interests include human-computer interaction,affective computing,machine learning,and computer vision.
  • Supported by:

    This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB1001405, and the National Natural Science Foundation of China under Grant Nos. 61232013, 61422212, and 61661146002.

在自然交流中,非正面头部姿态导致人脸表情识别的准确性和鲁棒性大幅下降。本文中,我们尝试从二维视频中识别具有较大头部旋转角度的人脸表情。为此,我们提出了一种基于depth patch的四维表情表示模型。该模型通过二维动态图像重建,用于表示非正面表情的连续空间变化和时序上下文。更进一步,我们提出了一种有效的深度神经网络分类器,它能够准确地从depth patch中捕获不同姿态下的表情特征并识别非正面表情。在BU-4DFE表情数据库上识别52度头部旋转范围内的非正面人脸表情的实验结果表明本文提出的方法取得了高达86.87%的识别准确率,超过了已有的方法。在BU-4DFE和Multi-PIE数据库上,我们对取得识别性能提升的关键因素进行了实验量化分析。

Abstract: The challenge of coping with non-frontal head poses during facial expression recognition results in considerable reduction of accuracy and robustness when capturing expressions that occur during natural communications. In this paper, we attempt to recognize facial expressions under poses with large rotation angles from 2D videos. A depth-patch based 4D expression representation model is proposed. It was reconstructed from 2D dynamic images for delineating continuous spatial changes and temporal context under non-frontal cases. Furthermore, we present an effective deep neural network classifier, which can accurately capture pose-variant expression features from the depth patches and recognize non-frontal expressions. Experimental results on the BU-4DFE database show that the proposed method achieves a high recognition accuracy of 86.87% for non-frontal facial expressions within a range of head rotation angle of up to 52°, outperforming existing methods. We also present a quantitative analysis of the components contributing to the performance gain through tests on the BU-4DFE and Multi-PIE datasets.

[1] Valstar M F, Pantic M, Patras I. Motion history for facial action detection in video. In Proc. IEEE Int. Conf. Systems, Man and Cybernetics, October 2004, pp.635-640.

[2] Zeng Z H, Pantic M, Roisman G I, Huang T S. A survey of affect recognition methods:Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Analysis and Machine Intelligence, 2009, 31(1):39-58.

[3] Sandbach G, Zafeiriou S, Pantic M, Yin L J. Static and dynamic 3D facial expression recognition:A comprehensive survey. Image Vision Computing, 2012, 30(10):683-697.

[4] Gunes H, Schuller B W. Categorical and dimensional affect analysis in continuous input:Current trends and future directions. Image Vision Computing, 2013, 31(2):120-136.

[5] Valstar M F, Gunes H, Pantic M. How to distinguish posed from spontaneous smiles using geometric features. In Proc. the 9th Int. Conf. Multimodal Interfaces, November 2007, pp.38-45.

[6] Abd El Meguid M K, Levine M D. Fully automated recognition of spontaneous facial expressions in videos using random forest classifiers. IEEE Trans. Affective Computing, 2014, 5(2):141-154.

[7] Chen H, Li J D, Zhang F J, Li Y, Wang H A. 3D modelbased continuous emotion recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.1836-1845.

[8] Zhu X Y, Lei Z, Yan J J, Yi D, Li S Z. High-fidelity pose and expression normalization for face recognition in the wild. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.787-796.

[9] Fanelli G, Weise T, Gall J, van Gool L J. Real time head pose estimation from consumer depth cameras. In Proc. the 33rd DAGM Symp. Pattern Recognition, Aug. 31-Sept. 2, 2011, pp.101-110.

[10] Fanelli G, Dantone M, Gall J, Fossati A, van Gool L J. Random forests for real time 3D face analysis. International Journal of Computer Vision, 2013, 101(3):437-458.

[11] Amor B B, Drira H, Berretti S, Daoudi M, Srivastava A. 4-D facial expression recognition by learning geometric deformations. IEEE Trans. Cybernetics, 2014, 44(12):2443-2457.

[12] Lowe D G. Distincitive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2):91-110.

[13] Ojala T, Pietikäinen M, Mäenpää T. Gray scale and rotation invariant texture classification with local binary patterns. In Proc. the 6th European Conf. Computer Vision, Jun. 26-Jul. 1, 2000, pp.404-420.

[14] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2005, pp.886-893.

[15] Nagaraju C, Sharadamani D, Maheswari C, Vardhan D V. Evaluation of LBP-based facial emotions recognition techniques to make consistent decisions. International Journal of Pattern Recognition and Artificial Intelligence, 2015, 29(6).

[16] Yin L J, Chen X C, Sun Y, Worm T, Reale M. A highresolution 3D dynamic facial expression database. In Proc. the 8th IEEE Int. Conf. Automatic Face and Gesture Recognition, September 2008.

[17] Zheng W M. Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affective Computing, 2014, 5(1):71-85.

[18] Gross R, Matthews I A, Cohn J F, Kanade T, Baker S. Multi-PIE. Image Vision Computing, 2010, 28(5):807-813.

[19] Yin L, Wei X Z, Sun Y, Wang J, Rosato M J. A 3D facial expression database for facial behavior research. In Proc. the 7th IEEE Int. Conf. Automatic Face and Gesture Recognition, April 2006, pp.211-216.

[20] Hu Y X, Zeng Z H, Yin L J, Wei X Z, Zhou X, Huang T S. Multi-view facial expression recognition. In Proc. the 8th IEEE Int. Conf. Automatic Face and Gesture Recognition, September 2008.

[21] Cao C, Weng Y L, Zhou S, Tong Y Y, Zhou K. FaceWarehouse:A 3D facial expression database for visual computing. IEEE Trans. Visualization and Computer Graphics, 2014, 20(3):413-425.

[22] Paysan P, Knothe R, Amberg B, Romdhani S, Vetter T. A 3D face model for pose and illumination invariant face recognition. In Proc. the 6th IEEE Int. Conf. Advanced Video and Signal Based Surveillance, September 2009, pp.296-301.

[23] Vieriu R, Tulyakov S, Semeniuta S, Sangineto E, Sebe N. Facial expression recognition under a wide range of head poses. In Proc. the 11th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, May 2015.

[24] Wu T F, Bartlett M S, Movellan J R. Facial expression recognition using gabor motion energy filters. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.42-47.

[25] Jung H, Lee S, Park S, Lee I, Ahn C, Kim J. Deep temporal appearance-geometry network for facial expression recognition. arXiv:1503.01532, 2015. https://arxiv.org/abs/1503.01532, May 2017.

[26] Guerrero P, Pavez M, Chávez D, Ochoa S F. Landmarkbased histograms of oriented gradients for facial emotion recognition. In Proc. the 7th Int. Work-Conference on Ambient Assisted Living, December 2015, pp.288-299.

[27] Yim J, Jung H, Yoo B, Choi C, Park D, Kim J. Rotating your face using multi-task deep neural network. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.676-684.

[28] Liu P, Han S Z, Meng Z B, Tong Y. Facial expression recognition via a boosted deep belief network. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.1805-1812.

[29] Lucey P, Cohn J F, Kanade T, Saragih J M, Ambadar Z, Matthews I A. The extended Cohn-Kanade dataset (CK+):A complete dataset for action unit and emotion-specified expression. In Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, June 2010, pp.94-101.

[30] Reale M, Zhang X, Yin L J. Nebula feature:A space-time feature for posed and spontaneous 4D facial behavior analysis. In Proc. the 10th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, April 2013.

[31] Lepetit V, Moreno-Noguer F, Fua P. EPnP:An accurate O(n) solution to the PnP problem. International Journal of Computer Vision, 2009, 81(2):155-166.

[32] Ekman P, Friesen W V, Hager J C. Facial Action Coding System. Salt Lake City, Utah:A Human Face, 2002.

[33] Balasubramanian M, Schwartz E L. The isomap algorithm and topological stability. Science, 2002, 295(5552):7.

[34] Srivastava R K, Greff K, Schmidhuber J. Training very deep networks. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.2377-2385.

[35] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.2818-2826.

[36] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770-778.

[37] Larsson G, Maire M, Shakhnarovich G. FractalNet:Ultradeep neural networks without residuals. In Proc. the 5th Int. Conf. Learning Representations, April 2017.

[38] de la Torre F, Chu W S, Xiong X H, Vicente F, Ding X Y, Cohn J F. Intraface. In Proc. the 11th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, May 2015.

[39] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th Int. Conf. Artificial Intelligence and Statistics, May 2010, pp.249-256.

[40] Xie L X, Wang J D, Wei Z, Wang M, Tian Q. DisturbLabel:Regularizing CNN on the loss layer. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.4753-4762.

[41] Jeni L A, Takács D, Lörincz A. High quality facial expression recognition in video streams using shape related information only. In Proc. IEEE Int. Conf. Computer Vision Workshops, November 2011, pp.2168-2174.

[42] Fang T H, Zhao X, Shah S K, Kakadiaris I A. 4D facial expression recognition. In Proc. IEEE Int. Conf. Computer Vision Workshops, November 2011, pp.1594-1601.

[43] Rudovic O, Pavlovic V, Pantic M. Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.2634-2641.

[44] Berretti S, Bimbo A D, Pala P. Automatic facial expression recognition in real-time from dynamic sequences of 3D face scans. The Visual Computer, 2013, 29(12):1333-1350.

[45] van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(2605):2579-2605.

[46] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd Int. Conf. Learning Representations, May 2015.

[47] Urban G, Geras K J, Kahou S E, Aslan O, Wang S J, Caruana R, Mohamed A, Philipose M, Richardson M. Do deep convolutional nets really need to be deep and convolutional? In Proc. the 5th Int. Conf. Learning Representations, April 2017.

[48] Zhou Z H, Zhao G Y, Pietikäinen M. Towards a practical lipreading system. In Proc. the 24th IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.137-144.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: