›› 2014, Vol. 29 ›› Issue (5): 785-798.doi: 10.1007/s11390-014-1468-z

Special Issue: Surveys; Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Computer Graphics and Multimedia • Previous Articles     Next Articles

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

Zhi-Neng Chen1,2(陈智能), Chong-Wah Ngo2(杨宗桦), Member,IEEE, Wei Zhang2(张 炜), Juan Cao3(曹 娟), Yu-Gang Jiang4(姜育刚)   

  1. 1. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2. Department of Computer Science, City University of Hong Kong, Hong Kong, China;
    3. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    4. School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2014-02-24 Revised:2014-07-03 Online:2014-09-05 Published:2014-09-05
  • About author:Zhi-Neng Chen received his B.S. and M.S. degrees in computer science from the College of Information Engineering, Xiangtan University, China, in 2004 and 2007, respectively, and Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2012. He is currently an assistant professor of the Institute of Automation, Chinese Academy of Sciences, Beijing. He was a senior research associate with the Department of Computer Science, City University of Hong Kong, in 2012. His current research interests include large-scale multimedia information retrieval and video processing.
  • Supported by:

    This work was supported by a research grant from City University of Hong Kong under Grant No. 7008178, and the National Natural Science Foundation of China under Grant Nos. 61228205, 61303175 and 61172153.

Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75,073 Internet videos of over 4,000 hours, covering 2,427 celebrities and 649,001 faces. This is to our knowledge the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.

[1] Zhao M, Yagnik J, Adam H et al. Large scale learning and recognition of faces in Web videos. In Proc. the 8th IEEE FGR, Sept. 2008, pp.1-7.

[2] Zhang X, Zhang L, Wang X J, Shum H Y. Finding celebrities in billions of Web images. IEEE Trans. Multimedia, 2012, 14(4): 995-1007.

[3] Xie H, Zhang Y, Tan J, Guo L, Li J. Contextual query expansion for image retrieval. IEEE Trans. Multimedia, 2014, 16(4): 1104-1114.

[4] Yao T, Ngo C W, Mei T. Circular reranking for visual search. IEEE Trans. Image Processing, 2013, 22(4): 1644-1655.

[5] Liu J, Huang Z, Cai H, Shen H T, Ngo C W, Wang W. Nearduplicate video retrieval: Current research and future trends. ACM Computing Surveys, 2013, 45(4): Article No.44.

[6] Zhang L, Zhang Y, Gu X, Tang J, Tian Q. Scalable similarity search with topology preserving hashing. IEEE Trans. Image Processing, 2014, 23(7): 3025-3039.

[7] Chen Z, Cao J, Xia T et al. Web video retagging. Multimedia Tools and Application, 2011, 55(1): 53-82.

[8] Berg T L, Berg A C, Edwards J et al. Names and faces in the news. In Proc. the 2004 IEEE CVPR, Jun. 2004, 2: 848-854.

[9] Bu J, Xu B, Wu C et al. Unsupervised face-name association via commute distance. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.219-228.

[10] Satoh S, Nakamura Y, Kanade T. Name-it: Naming and detecting faces in news videos. IEEE MultiMedia, 1999, 6(1): 22-35.

[11] Pham P T, Tuytelaars T, Moens M F. Naming people in news videos with label propagation. IEEE MultiMedia, 2011, 18(3): 44-55.

[12] Pham P T, Deschacht K, Tuytelaars T, Moens M F. Naming persons in video: Using the weak supervision of textual stories. J. Visual Communication and Image Representation, 2013, 24(7): 944-955

[13] Yang J, Hauptmann A G. Naming every individual in news video monologues. In Proc. the 12th Annual ACM Multimedia, Oct. 2004, pp.580-587.

[14] Yang J, Yan R, Hauptmann A G. Multiple instance learning for labeling faces in broadcasting news video. In Proc. the 13th Annual ACM Multimedia, Oct. 2005, pp.31-40.

[15] Duygulu P, Hauptmann A. What's news, what's not? Associating news videos with words. In Proc. the 3th CIVR, Jul. 2004, pp.132-140.

[16] Everingham M, Sivic J, Zisserman A. Hello! My name is ... buffy | Automatic naming of characters in TV video. In Proc. the 17th BMVC, Sept. 2006, pp.889-908.

[17] Ramanan D, Baker S, Kakade S. Leveraging archival video for building face datasets. In Proc. the 11th ICCV, Oct. 2007, pp.1-8.

[18] Cinbis R G, Verbeek J, Schmid C. Unsupervised metric learning for face identification in TV video. In Proc. the 13th ICCV, Nov. 2011, pp.1559-1566.

[19] Bäuml M, Tapaswi M, Stiefelhagen R. Semi-supervised learning with constraints for person identification in multimedia data. In Proc. the 26th IEEE CVPR, Jun. 2013, pp.3602-3609

[20] Zhang Y F, Xu C, Lu H et al. Character identification in feature-length films using global face-name matching. IEEE Trans. Multimedia, 2009, 11(7): 1276-1288.

[21] Guillaumin M, Mensink T, Verbeek J, Schmid C. Face recognition from caption-based supervision. International Journal of Computer Vision, 2012, 96(1): 64-82.

[22] Ozkan D, Duygulu P. Interesting faces: A graph-based approach for finding people in news. Pattern Recognition, 2010, 43(5): 1717-1735.

[23] Guillaumin M, Verbeek J, Schmid C. Multiple instance metric learning from automatically labeled bags of faces. In Proc. the 11th ECCV, Sept. 2010, pp.634-647.

[24] Ozcan M, Jie L, Ferrari V et al. A large-scale database of images and captions for automatic face naming. In Proc. the 22nd BMVC, Aug. 29-Sept. 2, 2011, Article No. 29.

[25] Huang G B, Ramesh M, Berg T et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, 2007.

[26] Wolf L, Hassner T, Maoz I. Face recognition in unconstrained videos with matched background similarity. In Proc. the 2011 IEEE CVPR, Jun. 2011, pp.529-534.

[27] Chen Z, Ngo C W, Cao J, Zhang W. Community as a connector: Associating faces with celebrity names in Web videos. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.809-812.

[28] Ruiz-del-Solar J, Verschae R, Correa M. Recognition of faces in unconstrained environments: A comparative study. EURASIP Journal on Advances in Signal Processing, 2009, pp.1-19

[29] Wang D, Hoi S C H, He Y, Zhu J. Mining weakly labeled Web facial images for search-based face annotation. IEEE Trans. Knowledge and Data Engineering, Jan. 2014, 26(1): 166-179.

[30] Stone Z, Zickler T, Darrell T. Toward large-scale face recognition using social network context. Proceedings of the IEEE, 2010, 98(8): 1408-1415.

[31] Cao J, Zhang Y D, Song Y C et al. MCG-WEBV: A benchmark dataset for Web video analysis. Technical Report, Institute of Computing Technology, CAS, May 2009.

[32] Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661-703.

[33] Sigurbjornsson B, Zwol R V. Flickr Tag recommendation based on collective knowledge. In Proc. the 17th Int. Conf. World Wide Web, Apr. 2008, pp.327-336.

[34] Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. the 20th ISCIS, Oct. 2005, pp.284-293.

[35] Song Y C, Zhang Y D, Cao J, Xia T, Li J T. Web video geolocation by geotagged social resources. IEEE Trans. Multimedia, 2012, 14(2): 456-470.

[36] Wu X, Ngo C W, Hauptmann A G, Tan H K. Real-time nearduplicate elimination for Web video search with content and context. IEEE Trans. Multimedia, 2009, 11(2): 196-207.
No related articles found!
Full text



[1] Zhao Ming;. 2-D EAG Method for the Recognition of Hand-Printed Chinese Characters[J]. , 1990, 5(4): 319 -328 .
[2] Ma Zhifang;. DKBLM——Deep Knowledge Based Learning Methodology[J]. , 1993, 8(4): 93 -98 .
[3] Bin-Bin Liu, Wei Dong, Jia-Xin Liu, Ya-Ting Zhang, Dai-Yan Wang. ProSy: API-Based Synthesis with Probabilistic Model[J]. Journal of Computer Science and Technology, 2020, 35(6): 1234 -1257 .
[4] Hong-Jie Dai, Yen-Ching Chang, Richard Tzong-Han Tsai, and Wen-Lian Hsu, Fellow, IEEE. New Challenges for Biological Text-Mining in the Next Decade[J]. , 2010, 25(1): 169 -inside back cover .
[5] Salaheddin Odeh, Member, ACM. Building Reusable Remote Labs with Adaptable Client User-Interfaces[J]. , 2010, 25(5): 999 -1015 .
[6] You-Ming Qiao (乔友明), Jayalal Sarma M.N., and Bang-Sheng Tang (唐邦晟). On Isomorphism Testing of Groups with Normal Hall Subgroups[J]. , 2012, 27(4): 687 -701 .
[7] Wei Hu (胡伟), Zhao Dong (董朝), and Guo-Dong Yuan (袁国栋). Edit Propagation via Edge-Aware Filtering[J]. , 2012, 27(4): 830 -840 .
[8] Peyman Teymoori, and Nasser Yazdani. Delay-Constrained Optimized Packet Aggregation in High-Speed Wireless Networks[J]. , 2013, 28(3): 525 -539 .
[9] Lei Guo, Jun Ma Hao-Ran Jiang, Zhu-Min Chen, Chang-Ming Xing. Social Trust Aware Item Recommendation for Implicit Feedback[J]. , 2015, 30(5): 1039 -1053 .
[10] Jian Pei. Preface[J]. , 2016, 31(4): 635 -636 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved