We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Volume 29 Issue 5
September  2014
Zhi-Neng Chen, Chong-Wah Ngo, Wei Zhang, Juan Cao, Yu-Gang Jiang. Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues[J]. Journal of Computer Science and Technology, 2014, 29(5): 785-798. DOI: 10.1007/s11390-014-1468-z
Citation: Zhi-Neng Chen, Chong-Wah Ngo, Wei Zhang, Juan Cao, Yu-Gang Jiang. Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues[J]. Journal of Computer Science and Technology, 2014, 29(5): 785-798. DOI: 10.1007/s11390-014-1468-z

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

Funds: This work was supported by a research grant from City University of Hong Kong under Grant No. 7008178, and the National Natural Science Foundation of China under Grant Nos. 61228205, 61303175 and 61172153.
More Information
  • Author Bio:

    Zhi-Neng Chen received his B.S. and M.S. degrees in computerscience from the College of Information Engineering, Xiangtan University, China, in 2004 and 2007, respectively, and Ph.D. degree in computerscience from the Institute of Computing Technology, Chinese Academy ofSciences, Beijing, in 2012. He is currently an assistant professor of theInstitute of Automation, Chinese Academy of Sciences, Beijing. He was a senior research associate with the Department of Computer Science, City University of Hong Kong,in 2012. His current research interests include large-scalemultimedia information retrieval and video processing.

  • Received Date: February 23, 2014
  • Revised Date: July 02, 2014
  • Published Date: September 04, 2014
  • Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75,073 Internet videos of over 4,000 hours, covering 2,427 celebrities and 649,001 faces. This is to our knowledge the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.
  • [1]
    Zhao M, Yagnik J, Adam H et al. Large scale learning and recognition of faces in Web videos. In Proc. the 8th IEEE FGR, Sept. 2008, pp.1-7.
    [2]
    Zhang X, Zhang L, Wang X J, Shum H Y. Finding celebrities in billions of Web images. IEEE Trans. Multimedia, 2012, 14(4): 995-1007.
    [3]
    Xie H, Zhang Y, Tan J, Guo L, Li J. Contextual query expansion for image retrieval. IEEE Trans. Multimedia, 2014, 16(4): 1104-1114.
    [4]
    Yao T, Ngo C W, Mei T. Circular reranking for visual search. IEEE Trans. Image Processing, 2013, 22(4): 1644-1655.
    [5]
    Liu J, Huang Z, Cai H, Shen H T, Ngo C W, Wang W. Nearduplicate video retrieval: Current research and future trends. ACM Computing Surveys, 2013, 45(4): Article No.44.
    [6]
    Zhang L, Zhang Y, Gu X, Tang J, Tian Q. Scalable similarity search with topology preserving hashing. IEEE Trans. Image Processing, 2014, 23(7): 3025-3039.
    [7]
    Chen Z, Cao J, Xia T et al. Web video retagging. Multimedia Tools and Application, 2011, 55(1): 53-82.
    [8]
    Berg T L, Berg A C, Edwards J et al. Names and faces in the news. In Proc. the 2004 IEEE CVPR, Jun. 2004, 2: 848-854.
    [9]
    Bu J, Xu B, Wu C et al. Unsupervised face-name association via commute distance. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.219-228.
    [10]
    Satoh S, Nakamura Y, Kanade T. Name-it: Naming and detecting faces in news videos. IEEE MultiMedia, 1999, 6(1): 22-35.
    [11]
    Pham P T, Tuytelaars T, Moens M F. Naming people in news videos with label propagation. IEEE MultiMedia, 2011, 18(3): 44-55.
    [12]
    Pham P T, Deschacht K, Tuytelaars T, Moens M F. Naming persons in video: Using the weak supervision of textual stories. J. Visual Communication and Image Representation, 2013, 24(7): 944-955
    [13]
    Yang J, Hauptmann A G. Naming every individual in news video monologues. In Proc. the 12th Annual ACM Multimedia, Oct. 2004, pp.580-587.
    [14]
    Yang J, Yan R, Hauptmann A G. Multiple instance learning for labeling faces in broadcasting news video. In Proc. the 13th Annual ACM Multimedia, Oct. 2005, pp.31-40.
    [15]
    Duygulu P, Hauptmann A. What's news, what's not? Associating news videos with words. In Proc. the 3th CIVR, Jul. 2004, pp.132-140.
    [16]
    Everingham M, Sivic J, Zisserman A. Hello! My name is ... buffy | Automatic naming of characters in TV video. In Proc. the 17th BMVC, Sept. 2006, pp.889-908.
    [17]
    Ramanan D, Baker S, Kakade S. Leveraging archival video for building face datasets. In Proc. the 11th ICCV, Oct. 2007, pp.1-8.
    [18]
    Cinbis R G, Verbeek J, Schmid C. Unsupervised metric learning for face identification in TV video. In Proc. the 13th ICCV, Nov. 2011, pp.1559-1566.
    [19]
    Bäuml M, Tapaswi M, Stiefelhagen R. Semi-supervised learning with constraints for person identification in multimedia data. In Proc. the 26th IEEE CVPR, Jun. 2013, pp.3602-3609
    [20]
    Zhang Y F, Xu C, Lu H et al. Character identification in feature-length films using global face-name matching. IEEE Trans. Multimedia, 2009, 11(7): 1276-1288.
    [21]
    Guillaumin M, Mensink T, Verbeek J, Schmid C. Face recognition from caption-based supervision. International Journal of Computer Vision, 2012, 96(1): 64-82.
    [22]
    Ozkan D, Duygulu P. Interesting faces: A graph-based approach for finding people in news. Pattern Recognition, 2010, 43(5): 1717-1735.
    [23]
    Guillaumin M, Verbeek J, Schmid C. Multiple instance metric learning from automatically labeled bags of faces. In Proc. the 11th ECCV, Sept. 2010, pp.634-647.
    [24]
    Ozcan M, Jie L, Ferrari V et al. A large-scale database of images and captions for automatic face naming. In Proc. the 22nd BMVC, Aug. 29-Sept. 2, 2011, Article No. 29.
    [25]
    Huang G B, Ramesh M, Berg T et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, 2007.
    [26]
    Wolf L, Hassner T, Maoz I. Face recognition in unconstrained videos with matched background similarity. In Proc. the 2011 IEEE CVPR, Jun. 2011, pp.529-534.
    [27]
    Chen Z, Ngo C W, Cao J, Zhang W. Community as a connector: Associating faces with celebrity names in Web videos. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.809-812.
    [28]
    Ruiz-del-Solar J, Verschae R, Correa M. Recognition of faces in unconstrained environments: A comparative study. EURASIP Journal on Advances in Signal Processing, 2009, pp.1-19
    [29]
    Wang D, Hoi S C H, He Y, Zhu J. Mining weakly labeled Web facial images for search-based face annotation. IEEE Trans. Knowledge and Data Engineering, Jan. 2014, 26(1): 166-179.
    [30]
    Stone Z, Zickler T, Darrell T. Toward large-scale face recognition using social network context. Proceedings of the IEEE, 2010, 98(8): 1408-1415.
    [31]
    Cao J, Zhang Y D, Song Y C et al. MCG-WEBV: A benchmark dataset for Web video analysis. Technical Report, Institute of Computing Technology, CAS, May 2009.
    [32]
    Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661-703.
    [33]
    Sigurbjornsson B, Zwol R V. Flickr Tag recommendation based on collective knowledge. In Proc. the 17th Int. Conf. World Wide Web, Apr. 2008, pp.327-336.
    [34]
    Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. the 20th ISCIS, Oct. 2005, pp.284-293.
    [35]
    Song Y C, Zhang Y D, Cao J, Xia T, Li J T. Web video geolocation by geotagged social resources. IEEE Trans. Multimedia, 2012, 14(2): 456-470.
    [36]
    Wu X, Ngo C W, Hauptmann A G, Tan H K. Real-time nearduplicate elimination for Web video search with content and context. IEEE Trans. Multimedia, 2009, 11(2): 196-207.

Catalog

    Article views (37) PDF downloads (1581) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return