›› 2014,Vol. 29 ›› Issue (5): 785-798.doi: 10.1007/s11390-014-1468-z

所属专题: 不能删除 Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• Special Issue on Advances in Computer Science and Technology (Part 2) • 上一篇    下一篇

网络视频人脸—姓名关联:大规模数据库,基准实验和开放性问题

Zhi-Neng Chen1,2(陈智能), Chong-Wah Ngo2(杨宗桦), Member,IEEE, Wei Zhang2(张 炜), Juan Cao3(曹 娟), Yu-Gang Jiang4(姜育刚)   

  1. 1. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2. Department of Computer Science, City University of Hong Kong, Hong Kong, China;
    3. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    4. School of Computer Science, Fudan University, Shanghai 200433, China
  • 收稿日期:2014-02-24 修回日期:2014-07-03 出版日期:2014-09-05 发布日期:2014-09-05
  • 作者简介:Zhi-Neng Chen received his B.S. and M.S. degrees in computer science from the College of Information Engineering, Xiangtan University, China, in 2004 and 2007, respectively, and Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2012. He is currently an assistant professor of the Institute of Automation, Chinese Academy of Sciences, Beijing. He was a senior research associate with the Department of Computer Science, City University of Hong Kong, in 2012. His current research interests include large-scale multimedia information retrieval and video processing.
  • 基金资助:

    This work was supported by a research grant from City University of Hong Kong under Grant No. 7008178, and the National Natural Science Foundation of China under Grant Nos. 61228205, 61303175 and 61172153.

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

Zhi-Neng Chen1,2(陈智能), Chong-Wah Ngo2(杨宗桦), Member,IEEE, Wei Zhang2(张 炜), Juan Cao3(曹 娟), Yu-Gang Jiang4(姜育刚)   

  1. 1. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2. Department of Computer Science, City University of Hong Kong, Hong Kong, China;
    3. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    4. School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2014-02-24 Revised:2014-07-03 Online:2014-09-05 Published:2014-09-05
  • About author:Zhi-Neng Chen received his B.S. and M.S. degrees in computer science from the College of Information Engineering, Xiangtan University, China, in 2004 and 2007, respectively, and Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2012. He is currently an assistant professor of the Institute of Automation, Chinese Academy of Sciences, Beijing. He was a senior research associate with the Department of Computer Science, City University of Hong Kong, in 2012. His current research interests include large-scale multimedia information retrieval and video processing.
  • Supported by:

    This work was supported by a research grant from City University of Hong Kong under Grant No. 7008178, and the National Natural Science Foundation of China under Grant Nos. 61228205, 61303175 and 61172153.

网络视频人脸—姓名关联旨在建立视频中人脸与周边文本中姓名的一一对应关系,是一个有着广阔应用前景的重要研究课题。但是,这个问题当前并没有得到很好的解决,其中一个重要的原因即缺乏基于真实环境构造的大规模数据库的支持。基于此,本文构造了大规模网络视频名人及人脸数据库WebV-Cele,用于支持网络视频人脸—姓名关联研究。该数据库拥有总时长超过4,000小时的75,073个网络视频,涵盖2,427个名人和649,001个人脸,是当前关于这一问题最大最全面的数据库。本文介绍了该数据库的构造过程,分析了该数据库的名人关系并挖掘得到相应的名人社区,给出了五种传统方法在该数据库上的人脸—姓名关联实验结果。此外,本文还指出了三个可以在该数据库上进行的重要且充满挑战的研究问题。

Abstract: Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75,073 Internet videos of over 4,000 hours, covering 2,427 celebrities and 649,001 faces. This is to our knowledge the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.

[1] Zhao M, Yagnik J, Adam H et al. Large scale learning and recognition of faces in Web videos. In Proc. the 8th IEEE FGR, Sept. 2008, pp.1-7.

[2] Zhang X, Zhang L, Wang X J, Shum H Y. Finding celebrities in billions of Web images. IEEE Trans. Multimedia, 2012, 14(4): 995-1007.

[3] Xie H, Zhang Y, Tan J, Guo L, Li J. Contextual query expansion for image retrieval. IEEE Trans. Multimedia, 2014, 16(4): 1104-1114.

[4] Yao T, Ngo C W, Mei T. Circular reranking for visual search. IEEE Trans. Image Processing, 2013, 22(4): 1644-1655.

[5] Liu J, Huang Z, Cai H, Shen H T, Ngo C W, Wang W. Nearduplicate video retrieval: Current research and future trends. ACM Computing Surveys, 2013, 45(4): Article No.44.

[6] Zhang L, Zhang Y, Gu X, Tang J, Tian Q. Scalable similarity search with topology preserving hashing. IEEE Trans. Image Processing, 2014, 23(7): 3025-3039.

[7] Chen Z, Cao J, Xia T et al. Web video retagging. Multimedia Tools and Application, 2011, 55(1): 53-82.

[8] Berg T L, Berg A C, Edwards J et al. Names and faces in the news. In Proc. the 2004 IEEE CVPR, Jun. 2004, 2: 848-854.

[9] Bu J, Xu B, Wu C et al. Unsupervised face-name association via commute distance. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.219-228.

[10] Satoh S, Nakamura Y, Kanade T. Name-it: Naming and detecting faces in news videos. IEEE MultiMedia, 1999, 6(1): 22-35.

[11] Pham P T, Tuytelaars T, Moens M F. Naming people in news videos with label propagation. IEEE MultiMedia, 2011, 18(3): 44-55.

[12] Pham P T, Deschacht K, Tuytelaars T, Moens M F. Naming persons in video: Using the weak supervision of textual stories. J. Visual Communication and Image Representation, 2013, 24(7): 944-955

[13] Yang J, Hauptmann A G. Naming every individual in news video monologues. In Proc. the 12th Annual ACM Multimedia, Oct. 2004, pp.580-587.

[14] Yang J, Yan R, Hauptmann A G. Multiple instance learning for labeling faces in broadcasting news video. In Proc. the 13th Annual ACM Multimedia, Oct. 2005, pp.31-40.

[15] Duygulu P, Hauptmann A. What's news, what's not? Associating news videos with words. In Proc. the 3th CIVR, Jul. 2004, pp.132-140.

[16] Everingham M, Sivic J, Zisserman A. Hello! My name is ... buffy | Automatic naming of characters in TV video. In Proc. the 17th BMVC, Sept. 2006, pp.889-908.

[17] Ramanan D, Baker S, Kakade S. Leveraging archival video for building face datasets. In Proc. the 11th ICCV, Oct. 2007, pp.1-8.

[18] Cinbis R G, Verbeek J, Schmid C. Unsupervised metric learning for face identification in TV video. In Proc. the 13th ICCV, Nov. 2011, pp.1559-1566.

[19] Bäuml M, Tapaswi M, Stiefelhagen R. Semi-supervised learning with constraints for person identification in multimedia data. In Proc. the 26th IEEE CVPR, Jun. 2013, pp.3602-3609

[20] Zhang Y F, Xu C, Lu H et al. Character identification in feature-length films using global face-name matching. IEEE Trans. Multimedia, 2009, 11(7): 1276-1288.

[21] Guillaumin M, Mensink T, Verbeek J, Schmid C. Face recognition from caption-based supervision. International Journal of Computer Vision, 2012, 96(1): 64-82.

[22] Ozkan D, Duygulu P. Interesting faces: A graph-based approach for finding people in news. Pattern Recognition, 2010, 43(5): 1717-1735.

[23] Guillaumin M, Verbeek J, Schmid C. Multiple instance metric learning from automatically labeled bags of faces. In Proc. the 11th ECCV, Sept. 2010, pp.634-647.

[24] Ozcan M, Jie L, Ferrari V et al. A large-scale database of images and captions for automatic face naming. In Proc. the 22nd BMVC, Aug. 29-Sept. 2, 2011, Article No. 29.

[25] Huang G B, Ramesh M, Berg T et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, 2007.

[26] Wolf L, Hassner T, Maoz I. Face recognition in unconstrained videos with matched background similarity. In Proc. the 2011 IEEE CVPR, Jun. 2011, pp.529-534.

[27] Chen Z, Ngo C W, Cao J, Zhang W. Community as a connector: Associating faces with celebrity names in Web videos. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.809-812.

[28] Ruiz-del-Solar J, Verschae R, Correa M. Recognition of faces in unconstrained environments: A comparative study. EURASIP Journal on Advances in Signal Processing, 2009, pp.1-19

[29] Wang D, Hoi S C H, He Y, Zhu J. Mining weakly labeled Web facial images for search-based face annotation. IEEE Trans. Knowledge and Data Engineering, Jan. 2014, 26(1): 166-179.

[30] Stone Z, Zickler T, Darrell T. Toward large-scale face recognition using social network context. Proceedings of the IEEE, 2010, 98(8): 1408-1415.

[31] Cao J, Zhang Y D, Song Y C et al. MCG-WEBV: A benchmark dataset for Web video analysis. Technical Report, Institute of Computing Technology, CAS, May 2009.

[32] Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661-703.

[33] Sigurbjornsson B, Zwol R V. Flickr Tag recommendation based on collective knowledge. In Proc. the 17th Int. Conf. World Wide Web, Apr. 2008, pp.327-336.

[34] Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. the 20th ISCIS, Oct. 2005, pp.284-293.

[35] Song Y C, Zhang Y D, Cao J, Xia T, Li J T. Web video geolocation by geotagged social resources. IEEE Trans. Multimedia, 2012, 14(2): 456-470.

[36] Wu X, Ngo C W, Hauptmann A G, Tan H K. Real-time nearduplicate elimination for Web video search with content and context. IEEE Trans. Multimedia, 2009, 11(2): 196-207.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 赵明;. 2-D EAG Method for the Recognition of Hand-Printed Chinese Characters[J]. , 1990, 5(4): 319 -328 .
[2] 马志方;. DKBLM——Deep Knowledge Based Learning Methodology[J]. , 1993, 8(4): 93 -98 .
[3] Bin-Bin Liu, Wei Dong, Jia-Xin Liu, Ya-Ting Zhang, Dai-Yan Wang. 基于概率模型的API程序合成方法[J]. 计算机科学技术学报, 2020, 35(6): 1234 -1257 .
[4] Hong-Jie Dai, Yen-Ching Chang, Richard Tzong-Han Tsai, Wen-Lian Hsu. 未来十年生物学文本挖掘的新挑战[J]. , 2010, 25(1): 169 -inside back cover .
[5] Salaheddin Odeh. [J]. , 2010, 25(5): 999 -1015 .
[6] You-Ming Qiao (乔友明), Jayalal Sarma M.N., and Bang-Sheng Tang (唐邦晟). 关于含正规霍尔子群的群的同构问题[J]. , 2012, 27(4): 687 -701 .
[7] Wei Hu (胡伟), Zhao Dong (董朝), and Guo-Dong Yuan (袁国栋). 基于边缘保持滤波的编辑传播[J]. , 2012, 27(4): 830 -840 .
[8] Peyman Teymoori, and Nasser Yazdani. 高速无线网络中延时约束下最优的数据包聚合机制[J]. , 2013, 28(3): 525 -539 .
[9] Lei Guo, Jun Ma Hao-Ran Jiang, Zhu-Min Chen, Chang-Ming Xing. 隐式数据中基于信任关系的Item推荐算法[J]. , 2015, 30(5): 1039 -1053 .
[10] Jian Pei. Preface[J]. , 2016, 31(4): 635 -636 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: