›› 2017, Vol. 32 ›› Issue (3): 507-519.doi: 10.1007/s11390-017-1740-0

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Special Section of CVM 2017 • Previous Articles     Next Articles

Discriminative Histogram Intersection Metric Learning and Its Applications

Peng-Yi Hao 1, Member, CCF, IEEE, Yang Xia 1, Xiao-Xin Li1, Sei-ichiro Kamata 2, Member, IEEE, Sheng-Yong Chen 1, Senior Member, CCF, IEEE   

  1. 1. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China;
    2. Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan
  • Received:2016-12-25 Revised:2017-03-09 Online:2017-05-05 Published:2017-05-05
  • Contact: 10.1007/s11390-017-1740-0
  • About author:Peng-Yi Hao received her Ph.D. degree in computer science from Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, in 2013. She is currently a lecturer of Zhejiang University of Technology, Hangzhou. Her current research interests include computer vision and image analysis. She is a member of CCF and IEEE.
  • Supported by:

    This work was supported by the Natural Science Foundation of Zhejiang Province of China under Grant Nos. LQ15F020008 and LY15F020028, the National Natural Science Foundation of China under Grant Nos. 61325019, 61402411, 61502424, and U1509207, and Japan Society for the Promotion of Science (JSPS KAKENHI) under Grant No. 15K00248.

In this paper, a novel method called discriminative histogram intersection metric learning (DHIML) is proposed for pair matching and classification. Specifically, we introduce a discrimination term for learning a metric from binary information such as same/not-same or similar/dissimilar, and then combine it with the classification error for the discrimination in classifier construction. Compared with conventional approaches, the proposed method has several advantages. 1) The histogram intersection strategy is adopted into metric learning to deal with the widely used histogram features effectively. 2) By introducing discriminative term and classification error term into metric learning, a more discriminative distance metric and a classifier can be learned together. 3) The objective function is robust to outliers and noises for both features and labels in the training. The performance of the proposed method is tested on four applications: face verification, face-track identification, face-track clustering, and image classification. Evaluations on the challenging restricted protocol of Labeled Faces in the Wild (LFW) benchmark, a dataset with more than 7000 face-tracks, and Caltech-101 dataset validate the robustness and discriminability of the proposed metric learning, compared with the recent state-of-the-art approaches.

[1] Zeng S, Xiong Y. Weighted average integration of sparse representation and collaborative representation for robust face recognition. Computional Visual Media, 2016, 2(4): 357-365.

[2] Zhu Z, Liang D, Zhang S et al. Traffic-sign detection and classification in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2110-2118.

[3] Weinberger K, Blitzer J, Sau L. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 2009, 10(2): 207-244.

[4] Davis J V, Kulis B, Jain P et al. Information-theoretic metric leaning. In Proc. the 24th International Conference on Machine Learning, June 2007, pp.209-216.

[5] Guillaumin M, Verbeek J, Schmid C. Is that you? Metric learning approaches for face identification. In Proc. the 12th International Conference on Computer Vision, Sept. 2009, pp.498-505.

[6] Nguyen H V, Bai L. Cosine similarity metric learning for face verification. In Proc. Asian Conference on Computer Vision, November 2010, pp.709-720.

[7] Zhang H, Xing Z, Chen L, Gao Y. Efficient metric allk-nearest-neighbor search on datasets without any index. Journal of Computer Science and Technology, 2016, 31(6): 1194-1211.

[8] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110.

[9] Ojala T, Pietikainen M, Maenpaa T. Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002, 24(7): 971-987.

[10] Rockfellar R. Convex Analysis. Princeton University Press, 2015.

[11] Wu J X, Rehg J M. Beyond the Euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In Proc. the 12th International Conference on Computer Vision, September 2009, pp.630-637.

[12] Huang K, Ying Y, Campbell C. GSML: A unified framework for sparse metric learning. In Proc. the 9th International Conference on Data Mining, December 2009, pp.189-198.

[13] Liu W, Pokharel P P, Principe J C. Correntropy: Properties and applications in non Gaussian signal processing. IEEE Transactions on Signal Processing, 2007, 55(11): 5286- 5298.

[14] Yuan X T, Hu B G. Robust feature extraction via information theoretic learning. In Proc. the 26th Annual International Conference on Machine Learning, June 2009, pp.1193-1200.

[15] Huang G, Ramesh M, Berg T, Learned-Miller E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report, University of Massachusetts, Amherst, October 2007, pp.7-49.

[16] Ying Y, Li P. Distance metric learning with eigenvalue optimization. Journal of Machine Learning Research, 2012, 13(1): 1-26.

[17] Guo H, Jiang Z, Davis L S. Discriminative dictionary learning with pariwise constraints. In Proc. Asian Conference on Computer Vision, November 2012, pp.328-342.

[18] Wolf L, Hassner T, Taigman Y. Similarity scores based on background samples. In Proc. Asian Conference on Computer Vision, September 2009, pp.88-97

[19] Taigman Y, Wolf L, Hassner T. Multiple one-shots for utilizing class label information. In Proc. British Machine Vision Conference, September 2009, pp.1-12.

[20] Daugman J G. Complete discrete2-D Gabor transforms by neural networks for image analysis and compression. IEEE Trans. Acoustics, Speech, and Signal Processing, 1988, 36(7): 1169-1179.

[21] Seo H J, Milanfar P. Face verification using the LARK representation. IEEE Trans. Information Forensics and Security, 2011, 6(4): 1275-1286.

[22] Wang F, Leonidas J. Supervised earth mover's distance learning and its computer vision applications. In Proc. European Conference on Computer Vision, October 2012, pp.442-455.

[23] Hao P, Kamata S. Efficient large-scale video retrieval via discriminative signatures. IEICE Transactions on Information and Systems, 2013, 96(8): 1800-1810.

[24] Liu F F, Fergus R, Perona P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007, 106(1): 59- 70.

[25] Jiang Z, Lin Z, Davis L. Learning a discriminative dictionary for sparse coding via label consistent K-SVD. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, June 2011 pp.1697-1704.

[26] Xiong C, Johnson D J, Corso J J. Efficient max-margin metric learning. In Proc. ECDM, Jan. 2012.

[27] Le T, Cuturi M. Unsupervised Riemannian metric learning for histograms using aitchison transformations. In Proc. International Conference on Machine Learning, July 2015, pp.2002-2011.

[28] Cuturi M, Avis D. Ground metriclearning. Journal of Machine Learning Research, 2014, 15(1): 533-564.

[29] Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, June 2006, pp.2169-2178.
No related articles found!
Full text



[1] Wu Yunzeng;. On the Development of Applications of Logic in Programming[J]. , 1987, 2(1): 30 -34 .
[2] Zhou Chaochen; Liu Xinxin;. Denote CSP with Temporal Formulas[J]. , 1990, 5(1): 17 -23 .
[3] Wang Lei; Tan Ying;. The Researches in Fault-Tolerant D ataflow Architecture[J]. , 1991, 6(4): 395 -398 .
[4] Jose K- Raphel; Siu Cheung Hui; Angela Goh;. Class Based Contextual Logic for DOOD[J]. , 1996, 11(2): 161 -170 .
[5] Chi-Ming CHUNG; Ding-An CHIANG; YANG Qing;. A Comparative Analysis of Different Arbitration Protocols for Multiple-Bus Multiprocessors[J]. , 1996, 11(3): 313 -325 .
[6] Ma Zongmin; Yan Li;. Using Multivalued Logic in Relational Database Containing Null Value[J]. , 1996, 11(4): 421 -426 .
[7] Wu Junsheng; Wu Guangmao;. Element-Partition-Based Methods for Visualization of 3D Unstructured Grid Data[J]. , 1998, 13(5): 417 -425 .
[8] WANG Deqiang; ZHAO Lianchang;. The Twisted-Cube Connected Networks[J]. , 1999, 14(2): 181 -187 .
[9] CHEN Haiming;. Function Definition Language FDL andIts Implementation[J]. , 1999, 14(4): 414 -421 .
[10] SHU Yantai; XUE Fei; JIN Zhigang; Oliver Yang;. The Impact of Self-Similar Traffic on Network Delay[J]. , 1999, 14(6): 585 -589 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved