›› 2016, Vol. 31 ›› Issue (3): 489-500.doi: 10.1007/s11390-016-1642-6

Special Issue: Artificial Intelligence and Pattern Recognition

• Special Section of CVM 2016 • Previous Articles     Next Articles

Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks

Xi-Jin Zhang, Student Member, CCF, Yi-Fan Lu, Student Member, CCF, and Song-Hai Zhang*, Member, CCF, ACM   

  1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Received:2015-12-01 Revised:2016-03-09 Online:2016-05-05 Published:2016-05-05
  • Contact: Song-Hai Zhang E-mail:shz@tsinghua.edu.cn
  • Supported by:

    This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013903, the National Natural Science Foundation of China under Grant No. 61373069, the Research Grant of Beijing Higher Institution Engineering Research Center, and the Tsinghua University Initiative Scientific Research Program.

In this paper, we proposed a multi-task system that can identify dish types, food ingredients, and cooking methods from food images with deep convolutional neural networks. We built up a dataset of 360 classes of different foods with at least 500 images for each class. To reduce the noises of the data, which was collected from the Internet, outlier images were detected and eliminated through a one-class SVM trained with deep convolutional features. We simultaneously trained a dish identifier, a cooking method recognizer, and a multi-label ingredient detector. They share a few low-level layers in the deep network architecture. The proposed framework shows higher accuracy than traditional method with handcrafted features, and the cooking method recognizer and ingredient detector can be applied to dishes which are not included in the training dataset to provide reference information for users.

[1] Yang S, Chen M, Pomerleau D, Sukthankar R. Food recognition using statistics of pairwise local features. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2010, pp.2249-2256.

[2] Retna Swami M S S K, Karuppiah M. Optimal feature extraction using greedy approach for random image components and subspace approach in face recognition. Journal of Computer Science and Technology, 2013, 28(2): 322-328.

[3] Hall P, Cai H, Wu Q, Corradi T. Crossdepiction problem: Recognition and synthesis of photographs and artwork. Computational Visual Media, 2015, 1(2): 91-103.

[4] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Conference on Neural Information Processing Systems (NIPS), December 2012, pp.1106-1114.

[5] Ghosh S, Laksana E, Scherer S, Morency L P. A multi-label convolutional neural network approach to cross-domain action unit detection. In Proc. IEEE Int. Conf. Affective Computing and Intelligent Interaction (ACII), May 2015, pp.609-615.

[6] Li S, Liu Z Q, Chan A. Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. International Journal of Computer Vision, 2015, 113(1): 19-36.

[7] Chen M Y, Yang Y H, Ho C J, Wang S H, Liu S M, Chang E, Yeh C H, Ouhyoung M. Automatic Chinese food identification and quantity estimation. In Proc. SIGGRAPH Asia 2012 Technical Briefs, November 2012, pp.29:1-29:4.

[8] Kagaya H, Aizawa K, Ogawa M. Food detection and recognition using convolutional neural network. In Proc. the 22nd ACM International Conference on Multimedia, November 2014, pp.1085-1088.

[9] Kawano Y, Yanai K. Food image recognition with deep convolutional features. In Proc. the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, September 2014, pp.589-593.

[10] Chen M, Dhingra K, Wu W, Yang L, Sukthankar R, Yang J. PFID: Pittsburgh fastfood image dataset. In Proc. the 16th IEEE International Conference on Image Processing (ICIP), November 2009, pp.289-292.

[11] Hoashi H, Joutou T, Yanai K. Image recognition of 85 food categories by feature fusion. In Proc. IEEE International Symposium on Multimedia (ISM), December 2010, pp.296-301.

[12] Joutou T, Yanai K. A food image recognition system with multiple kernel learning. In Proc. the 16th IEEE International Conference on Image Processing (ICIP), November 2009, pp.285-288.

[13] Matsuda Y, Hoashi H, Yanai K. Recognition of multiplefood images by detecting candidate regions. In Proc. IEEE International Conference on Multimedia and Expo (ICME), July 2012, pp.25-30.

[14] Kawano Y, Yanai K. Real-time mobile food recognition system. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2013.

[15] Bosch M, Zhu F, Khanna N, Boushey C J, Delp E J. Combining global and local features for food identification in dietary assessment. In Proc. the 18th IEEE International Conference on Image Processing (ICIP), September 2011, pp.1789-1792.

[16] Maruyama T, Kawano Y, Yanai K. Realtime mobile recipe recommendation system using food ingredient recognition. In Proc. the 2nd ACM International Workshop on Interactive Multimedia on Mobile and Portable Devices, Oct. 29-Nov. 2, 2012, pp.27-34.

[17] Wang C, Huang K Q. VFM: Visual feedback model for robust object recognition. Journal of Computer Science and Technology, 2015, 30(2): 325-339.

[18] Yang X, Kim S, Xing E P. Heterogeneous multitask learning with joint sparsity constraints. In Proc. the 23rd Annual Conference on Neural Information Processing Systems (NIPS), December 2009, pp.2151-2159.

[19] Wang X, Fouhey D F, Gupta A. Designing deep networks for surface normal estimation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp.539-547.

[20] Amer M, Goldstein M, Abdennadher S. Enhancing oneclass support vector machines for unsupervised anomaly detection. In Proc. the ACM SIGKDD Workshop on Outlier Detection and Description, August 2013, pp.8-15.

[21] Nair V, Hinton G E. Rectified linear units improve restricted boltzmann machines. In Proc. the 27th International Conference on Machine Learning (ICML), June 2010, pp.807-814.

[22] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533-536.

[23] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580, 2012. http://arxiv.org/abs/1207.0580, Mar. 2016.

[24] Branson S, Beijbom O, Belongie S. Efficient large-scale structured learning. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013, pp.1806-1813.
No related articles found!
Full text



[1] Jin Zhiquan; Liu Chengfei; Sun Zhongxiu; Zhou Xiaofang; Chen Peipei; Gu Jianming;. Design and Implementation of a Heterogeneous Distributed Database System[J]. , 1990, 5(4): 363 -373 .
[2] Cai Shijie; Zhang Fuyan;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[3] Weigeng Shi;. Reconnectable Network with Limited Resources[J]. , 1991, 6(3): 243 -249 .
[4] Tian Jie;. The Geometric Continuity of Rational Bezler Triangular Surfaces[J]. , 1991, 6(4): 383 -388 .
[5] Xu Jianguo; Gou Yuchai; Lin Zongkai;. HEPAPS:A PCB Automatic Placement System[J]. , 1992, 7(1): 39 -46 .
[6] Wang Yihe; Hong Jiarong;. AECAM:An Extension Matrix Algorithm on a Cellular Automata Machine[J]. , 1992, 7(1): 88 -91 .
[7] Xu Qingyun; Wang Nengbin;. Concurrency Control Mechanism of Complex Objects[J]. , 1992, 7(4): 305 -310 .
[8] Deng Tieqing; Wu Quanyuan; Wang Zhiying;. A New Integrated System of Logic Programming and Relational Database[J]. , 1993, 8(1): 58 -67 .
[9] Wu Xindong;. Inductive Learning[J]. , 1993, 8(2): 22 -36 .
[10] Ma Jun; Ma Shaohan;. An O(k~2n~2) Algorithm to Find a k-Partition in a k-Connected Graph[J]. , 1994, 9(1): 86 -91 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved