1 Beijing Key Laboratory of Human-Computer Interaction, Institute of Software, Chinese Academy of Sciences Beijing 100190, China;
2 University of Chinese Academy of Sciences, Beijing 100049, China;
3 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Abstract The challenge of coping with non-frontal head poses during facial expression recognition results in considerable reduction of accuracy and robustness when capturing expressions that occur during natural communications. In this paper, we attempt to recognize facial expressions under poses with large rotation angles from 2D videos. A depth-patch based 4D expression representation model is proposed. It was reconstructed from 2D dynamic images for delineating continuous spatial changes and temporal context under non-frontal cases. Furthermore, we present an effective deep neural network classifier, which can accurately capture pose-variant expression features from the depth patches and recognize non-frontal expressions. Experimental results on the BU-4DFE database show that the proposed method achieves a high recognition accuracy of 86.87% for non-frontal facial expressions within a range of head rotation angle of up to 52°, outperforming existing methods. We also present a quantitative analysis of the components contributing to the performance gain through tests on the BU-4DFE and Multi-PIE datasets.
This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB1001405, and the National Natural Science Foundation of China under Grant Nos. 61232013, 61422212, and 61661146002.
About author: Nai-Ming Yao is a Ph.D.candidate at Institute of Software,Chinese Academy of Sciences,Beijing,and University of Chinese Academy of Sciences,Beijing.His research interests include human-computer interaction,affective computing,machine learning,and computer vision.
Cite this article:
Nai-Ming Yao, Hui Chen, Qing-Pei Guo, Hong-An Wang.Non-Frontal Facial Expression Recognition Using a Depth-Patch Based Deep Neural Network[J] Journal of Computer Science and Technology, 2017,V32(6): 1172-1185
 Valstar M F, Pantic M, Patras I. Motion history for facial action detection in video. In Proc. IEEE Int. Conf. Systems, Man and Cybernetics, October 2004, pp.635-640. Zeng Z H, Pantic M, Roisman G I, Huang T S. A survey of affect recognition methods:Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Analysis and Machine Intelligence, 2009, 31(1):39-58. Sandbach G, Zafeiriou S, Pantic M, Yin L J. Static and dynamic 3D facial expression recognition:A comprehensive survey. Image Vision Computing, 2012, 30(10):683-697. Gunes H, Schuller B W. Categorical and dimensional affect analysis in continuous input:Current trends and future directions. Image Vision Computing, 2013, 31(2):120-136. Valstar M F, Gunes H, Pantic M. How to distinguish posed from spontaneous smiles using geometric features. In Proc. the 9th Int. Conf. Multimodal Interfaces, November 2007, pp.38-45. Abd El Meguid M K, Levine M D. Fully automated recognition of spontaneous facial expressions in videos using random forest classifiers. IEEE Trans. Affective Computing, 2014, 5(2):141-154. Chen H, Li J D, Zhang F J, Li Y, Wang H A. 3D modelbased continuous emotion recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.1836-1845. Zhu X Y, Lei Z, Yan J J, Yi D, Li S Z. High-fidelity pose and expression normalization for face recognition in the wild. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.787-796. Fanelli G, Weise T, Gall J, van Gool L J. Real time head pose estimation from consumer depth cameras. In Proc. the 33rd DAGM Symp. Pattern Recognition, Aug. 31-Sept. 2, 2011, pp.101-110. Fanelli G, Dantone M, Gall J, Fossati A, van Gool L J. Random forests for real time 3D face analysis. International Journal of Computer Vision, 2013, 101(3):437-458. Amor B B, Drira H, Berretti S, Daoudi M, Srivastava A. 4-D facial expression recognition by learning geometric deformations. IEEE Trans. Cybernetics, 2014, 44(12):2443-2457. Lowe D G. Distincitive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2):91-110. Ojala T, Pietikäinen M, Mäenpää T. Gray scale and rotation invariant texture classification with local binary patterns. In Proc. the 6th European Conf. Computer Vision, Jun. 26-Jul. 1, 2000, pp.404-420. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2005, pp.886-893. Nagaraju C, Sharadamani D, Maheswari C, Vardhan D V. Evaluation of LBP-based facial emotions recognition techniques to make consistent decisions. International Journal of Pattern Recognition and Artificial Intelligence, 2015, 29(6). Yin L J, Chen X C, Sun Y, Worm T, Reale M. A highresolution 3D dynamic facial expression database. In Proc. the 8th IEEE Int. Conf. Automatic Face and Gesture Recognition, September 2008. Zheng W M. Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affective Computing, 2014, 5(1):71-85. Gross R, Matthews I A, Cohn J F, Kanade T, Baker S. Multi-PIE. Image Vision Computing, 2010, 28(5):807-813. Yin L, Wei X Z, Sun Y, Wang J, Rosato M J. A 3D facial expression database for facial behavior research. In Proc. the 7th IEEE Int. Conf. Automatic Face and Gesture Recognition, April 2006, pp.211-216. Hu Y X, Zeng Z H, Yin L J, Wei X Z, Zhou X, Huang T S. Multi-view facial expression recognition. In Proc. the 8th IEEE Int. Conf. Automatic Face and Gesture Recognition, September 2008. Cao C, Weng Y L, Zhou S, Tong Y Y, Zhou K. FaceWarehouse:A 3D facial expression database for visual computing. IEEE Trans. Visualization and Computer Graphics, 2014, 20(3):413-425. Paysan P, Knothe R, Amberg B, Romdhani S, Vetter T. A 3D face model for pose and illumination invariant face recognition. In Proc. the 6th IEEE Int. Conf. Advanced Video and Signal Based Surveillance, September 2009, pp.296-301. Vieriu R, Tulyakov S, Semeniuta S, Sangineto E, Sebe N. Facial expression recognition under a wide range of head poses. In Proc. the 11th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, May 2015. Wu T F, Bartlett M S, Movellan J R. Facial expression recognition using gabor motion energy filters. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.42-47. Jung H, Lee S, Park S, Lee I, Ahn C, Kim J. Deep temporal appearance-geometry network for facial expression recognition. arXiv:1503.01532, 2015. https://arxiv.org/abs/1503.01532, May 2017. Guerrero P, Pavez M, Chávez D, Ochoa S F. Landmarkbased histograms of oriented gradients for facial emotion recognition. In Proc. the 7th Int. Work-Conference on Ambient Assisted Living, December 2015, pp.288-299. Yim J, Jung H, Yoo B, Choi C, Park D, Kim J. Rotating your face using multi-task deep neural network. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.676-684. Liu P, Han S Z, Meng Z B, Tong Y. Facial expression recognition via a boosted deep belief network. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.1805-1812. Lucey P, Cohn J F, Kanade T, Saragih J M, Ambadar Z, Matthews I A. The extended Cohn-Kanade dataset (CK+):A complete dataset for action unit and emotion-specified expression. In Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, June 2010, pp.94-101. Reale M, Zhang X, Yin L J. Nebula feature:A space-time feature for posed and spontaneous 4D facial behavior analysis. In Proc. the 10th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, April 2013. Lepetit V, Moreno-Noguer F, Fua P. EPnP:An accurate O(n) solution to the PnP problem. International Journal of Computer Vision, 2009, 81(2):155-166. Ekman P, Friesen W V, Hager J C. Facial Action Coding System. Salt Lake City, Utah:A Human Face, 2002. Balasubramanian M, Schwartz E L. The isomap algorithm and topological stability. Science, 2002, 295(5552):7. Srivastava R K, Greff K, Schmidhuber J. Training very deep networks. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.2377-2385. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.2818-2826. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770-778. Larsson G, Maire M, Shakhnarovich G. FractalNet:Ultradeep neural networks without residuals. In Proc. the 5th Int. Conf. Learning Representations, April 2017. de la Torre F, Chu W S, Xiong X H, Vicente F, Ding X Y, Cohn J F. Intraface. In Proc. the 11th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, May 2015. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th Int. Conf. Artificial Intelligence and Statistics, May 2010, pp.249-256. Xie L X, Wang J D, Wei Z, Wang M, Tian Q. DisturbLabel:Regularizing CNN on the loss layer. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.4753-4762. Jeni L A, Takács D, Lörincz A. High quality facial expression recognition in video streams using shape related information only. In Proc. IEEE Int. Conf. Computer Vision Workshops, November 2011, pp.2168-2174. Fang T H, Zhao X, Shah S K, Kakadiaris I A. 4D facial expression recognition. In Proc. IEEE Int. Conf. Computer Vision Workshops, November 2011, pp.1594-1601. Rudovic O, Pavlovic V, Pantic M. Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.2634-2641. Berretti S, Bimbo A D, Pala P. Automatic facial expression recognition in real-time from dynamic sequences of 3D face scans. The Visual Computer, 2013, 29(12):1333-1350. van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(2605):2579-2605. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd Int. Conf. Learning Representations, May 2015. Urban G, Geras K J, Kahou S E, Aslan O, Wang S J, Caruana R, Mohamed A, Philipose M, Richardson M. Do deep convolutional nets really need to be deep and convolutional? In Proc. the 5th Int. Conf. Learning Representations, April 2017. Zhou Z H, Zhao G Y, Pietikäinen M. Towards a practical lipreading system. In Proc. the 24th IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.137-144.