计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (2): 287-304.doi: 10.1007/s11390-019-1911-2

所属专题: Artificial Intelligence and Pattern Recognition

• • 上一篇    下一篇

一种基于内容和文本的高效图像检索方法

Mengqi Zeng1, Bin Yao1,*, Member, CCF, ACM, IEEE, Zhi-Jie Wang2,3,4, Member, CCF, ACM, Yanyan Shen1, Feifei Li5, Senior Member, IEEE, Member, ACM, Jianfeng Zhang6, Hao Lin6, Minyi Guo1, Fellow, CCF, IEEE, Member, ACM   

  1. 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2 School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;
    3 Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510006, China;
    4 National Engineering Laboratory for Big Data Analysis and Applications, Beijing 100871, China;
    5 School of Computing, University of Utah, Salt Lake City 84112, U.S.A.;
    6 Alibaba Group, Hangzhou 311121, China
  • 收稿日期:2018-07-09 修回日期:2019-01-24 出版日期:2019-03-05 发布日期:2019-03-16
  • 通讯作者: Bin Yao E-mail:yaobin@cs.sjtu.edu.cn
  • 作者简介:Mengqi Zeng is working toward her Master's degree in the Department of Computer Science and Engineering at Shanghai Jiao Tong University, Shanghai. Her research interests include information retrieval, database, and distributed computing.
  • 基金资助:
    This work was supported by the National Basic Research 973 Program of China under Grant No. 2015CB352403, the National Key Research and Development Program of China under Grant Nos. 2018YFC1504504, 2016YFB0700502 and 2018YFB1004400, the National Natural Science Foundation of China under Grant Nos. 61872235, 61729202, 61832017, U1636210, 61832013, 61672351, 61472453, 61702320, U1401256, U1501252, U1611264, U1711261, U1711262, U61811264, and Guangdong Province Key Laboratory of Popular High Performance Computers of Shenzhen University under Grant No. SZU-GDPHPCL2017.

CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

Mengqi Zeng1, Bin Yao1,*, Member, CCF, ACM, IEEE, Zhi-Jie Wang2,3,4, Member, CCF, ACM, Yanyan Shen1, Feifei Li5, Senior Member, IEEE, Member, ACM, Jianfeng Zhang6, Hao Lin6, Minyi Guo1, Fellow, CCF, IEEE, Member, ACM   

  1. 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2 School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;
    3 Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510006, China;
    4 National Engineering Laboratory for Big Data Analysis and Applications, Beijing 100871, China;
    5 School of Computing, University of Utah, Salt Lake City 84112, U.S.A.;
    6 Alibaba Group, Hangzhou 311121, China
  • Received:2018-07-09 Revised:2019-01-24 Online:2019-03-05 Published:2019-03-16
  • Contact: Bin Yao E-mail:yaobin@cs.sjtu.edu.cn
  • About author:Mengqi Zeng is working toward her Master's degree in the Department of Computer Science and Engineering at Shanghai Jiao Tong University, Shanghai. Her research interests include information retrieval, database, and distributed computing.
  • Supported by:
    This work was supported by the National Basic Research 973 Program of China under Grant No. 2015CB352403, the National Key Research and Development Program of China under Grant Nos. 2018YFC1504504, 2016YFB0700502 and 2018YFB1004400, the National Natural Science Foundation of China under Grant Nos. 61872235, 61729202, 61832017, U1636210, 61832013, 61672351, 61472453, 61702320, U1401256, U1501252, U1611264, U1711261, U1711262, U61811264, and Guangdong Province Key Laboratory of Popular High Performance Computers of Shenzhen University under Grant No. SZU-GDPHPCL2017.

在图像检索中结合视觉和文本信息可以有效地减轻传统技术的语义鸿沟问题,因此最近受到了大量的关注,基于这种结合方式的图像检索也被称为基于内容和文本的图像检索(CTBIR)。然而,据我们所知,这方面现有的工作多集中于提高图像的检索质量,对于如何提高检索效率却鲜有提及。如今,图像数据在我们的日常生活中被广泛使用,数据规模急剧扩大,因此对图像检索效率的研究具有重要的意义和价值。这篇文章提出了一种高效的图像检索方法,名为CATIRI,该方法使用一种三段式解决方案框架,核心是一种新型的索引结构MHIM-tree。MHIM-tree集成了曼哈顿哈希方法和倒排索引、M-tree等多种结构。为了在查询中使用此索引MHIM-tree,我们提出了一组重要的度量指标,显示了它们的内在性质。并基于MHIM-tree和这些度量,设计了一种top-k查询算法来完成基于内容和文本的图像检索。基于测试数据集的实验结果说明,CATIRI方法的检索效率比竞争算法要高将近一个数量级。

关键词: 图像检索, 文本和视觉特征, 索引, top-k

Abstract: The combination of visual and textual information in image retrieval remarkably alleviates the semantic gap of traditional image retrieval methods, and thus it has attracted much attention recently. Image retrieval based on such a combination is usually called the content-and-text based image retrieval (CTBIR). Nevertheless, existing studies in CTBIR mainly make efforts on improving the retrieval quality. To the best of our knowledge, little attention has been focused on how to enhance the retrieval efficiency. Nowadays, image data is widespread and expanding rapidly in our daily life. Obviously, it is important and interesting to investigate the retrieval efficiency. To this end, this paper presents an efficient image retrieval method named CATIRI (content-and-text based image retrieval using indexing). CATIRI follows a three-phase solution framework that develops a new indexing structure called MHIM-tree. The MHIM-tree seamlessly integrates several elements including Manhattan Hashing, Inverted index, and M-tree. To use our MHIM-tree wisely in the query, we present a set of important metrics and reveal their inherent properties. Based on them, we develop a top-k query algorithm for CTBIR. Experimental results based on benchmark image datasets demonstrate that CATIRI outperforms the competitors by an order of magnitude.

Key words: image retrieval, text-and-visual feature, indexing, top-k

[1] Datta R, Joshi D, Li J, Wang J Z. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): Article No. 5.
[2] Long M, Cao Y, Wang J, Yu P S. Composite correlation quantization for efficient multimodal retrieval. In Proc. the 39th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Jul. 2016, pp.579-588.
[3] Zhu L, Shen J, Xie L, Cheng Z. Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowledge and Data Engineering, 2017, 29(2): 472-486.
[4] Xu B, Bu J, Chen C, Cai D, He X. EMR: A scalable graph-based ranking model for content-based image retrieval. IEEE Trans. Knowledge and Data Engineering, 2015, 27(1): 102-114.
[5] Shen H T, Jiang S, Tan K L, Huang Z, Zhou X. Speed up interactive image retrieval. The VLDB Journal, 2009, 18(1): 329-343.
[6] Falchi F, Lucchese C, Orlando S, Perego R, Rabitti F. Caching content-based queries for robust and efficient image retrieval. In Proc. the 12th Int. Conf. Extending Database Technology: Advances in Database Technology, Mar. 2009, pp.780-790.
[7] Zhang C, Chai J Y, Jin R. User term feedback in interactive text-based image retrieval. In Proc. the 28th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2005, pp.51-58.
[8] Li W, Duan L, Xu D, Tsang I W. Text-based image retrieval using progressive multi-instance learning. In Proc. Int. Conf. Computer Vision, Nov. 2011, pp.2049-2055.
[9] Wu L, Jin R, Jain A K. Tag completion for image retrieval. IEEE Trans. Pattern Analysis and Machine Intelligence, 2013, 35(3): 716-727.
[10] Tong S, Chang E. Support vector machine active learning for image retrieval. In Proc. the 9th ACM Int. Conf. Multimedia, Sept. 2001, pp.107-118.
[11] Liu D, Hua K A, Vu K. Fast query point movement techniques with relevance feedback for content-based image retrieval. In Proc. the 10th Int. Conf. Extending Database Technology, Mar. 2006, pp.700-717.
[12] Kulis B, Grauman K. Kernelized locality-sensitive hashing for scalable image search. In Proc. the 12th IEEE Int. Conf. Computer Vision, Sept. 2009, pp.2130-2137.
[13] Smeulders A W M, Worring M, Santini S, Gupta A, Jain R C. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349-1380.
[14] Deng J, Berg A C, Li F F. Hierarchical semantic indexing for large scale image retrieval. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2011, pp.785-792.
[15] Ooi B C, Tan K L, Chua T S, Hsu W. Fast image retrieval using color-spatial information. The VLDB Journal, 1998, 7(2): 115-128.
[16] Xia H, Wu P, Hoi S C H, Jin R. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval. In Proc. the 35th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2012, pp.55- 64.
[17] Christel M G. Examining user interactions with video retrieval systems. In Proc. the 2017 International Society for Optical Engineering, Oct. 2007, Article No. 650606.
[18] Zhou X S, Huang T S. Unifying keywords and visual contents in image retrieval. IEEE Multimedia, 2002, 9(2): 23- 33.
[19] Zagoris K, Chatzichristofis S A, Arampatzis A. Bag-ofvisual-words vs global image descriptors on two-stage multimodal retrieval. In Proc. the 34th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Dec. 2011, pp.1251-1252.
[20] Caicedo J C, Moreno J G, Niño E A, González F A. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Proc. the 11th ACM SIGMM Int. Conf. Multimedia Information Retrieval, Mar. 2010, pp.359-366.
[21] Clinchant S, Ah-Pine J, Csurka G. Semantic combination of textual and visual information in multimedia retrieval. In Proc. the 1st ACM Int. Conf. Multimedia Retrieval, Apr. 2011, Article No. 44.
[22] Kong W, Li W J, Guo M. Manhattan hashing for large-scale image retrieval. In Proc. the 35th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2012, pp.45-54.
[23] Zobel J, Moffat A. Inverted files for text search engines. ACM Computing Surveys, 2006, 38(2): Article No. 6.
[24] Ciaccia P, Patella M, Zezula P. M-tree: An efficient access method for similarity search in metric spaces. In Proc. the 23rd Int. Conf. Very Large Data Bases, Aug. 1997, pp.426- 435.
[25] Rasiwasia N, Pereira C J, Coviello E, Doyle G, Lanckriet G R G, Levy R, Vasconcelos N. A new approach to crossmodal multimedia retrieval. In Proc. the 18th ACM Int. Conf. Multimedia, Oct. 2010, pp.251-260.
[26] Yang C, Lozano-Pérez T. Image database retrieval with multiple-instance learning techniques. In Proc. the 16th Int. Conf. Data Engineering, Feb. 2000, pp.233-243.
[27] Natsev A, Rastogi R, Shim K. WALRUS: A similarity retrieval algorithm for image databases. In Proc. the 1999 ACM SIGMOD International Conference on Management of Data, Jun. 1999, pp.395-406.
[28] Mamou J, Mass Y, Shmueli-Scheuer M, Sznajder B. A unified inverted index for an efficient image and text retrieval. In Proc. the 32nd Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Jul. 2009, pp.814-815.
[29] Rabitti F, Savino P. An information retrieval approach for image databases. In Proc. the 18th Int. Conf. Very Large Data Bases, Aug. 1992, pp.574-584.
[30] Chu W W, Ieong I T, Taira R K. A semantic modeling approach for image retrieval by content. The VLDB Journal, 1994, 3(4): 445-477.
[31] Brown L, Gruenwald L. A prototype content-based retrieval system that uses virtual images to save space. In Proc. the 27th Int. Conf. Very Large Data Bases, Sept. 2001, pp.693- 694.
[32] Chen L, Gao Y, Xing Z, Jensen C S, Chen G. I2RS: A distributed geo-textual image retrieval and recommendation system. Proceedings of the VLDB Endowment, 2015, 8(12): 1884-1887.
[33] Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. Journal of Computer Vision, 2001, 42(3): 145-175.
[34] Sivic J, Zisserman A. Video Google: A text retrieval approach to object matching in videos. In Proc. the 9th IEEE Int. Conf. Computer Vision, Oct. 2003, pp.1470-1477.
[35] Ponte J M, Croft W B. A language modeling approach to information retrieval. In Proc. the 21st Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 1998, pp.275-281.
[36] Zhai C, Lafferty J. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Information Systems, 2004, 22(2): 179-214.
[37] Depeursinge A, Müller H. Fusion techniques for combining textual and visual information retrieval. In ImageCLEF, Experimental Evaluation in Visual Information Retrieval, Müller H, Clough P, Deselaers T, Caputo B (eds.), Springer, 2010, pp.95-114.
[38] Wang J, Liu W, Kumar S, Chang S. Learning to hash for indexing big data — A survey. Proceedings of the IEEE, 2016, 104(1): 34-57.
[39] Cao X, Chen L, Cong G, Jensen C S, Qu Q, Skovsgaard A, Wu D, Yiu M L. Spatial keyword querying. In Proc. the 31st Int. Conf. Conceptual Modeling, Oct. 2012, pp.16-29.
[40] Gong Y, Lazebnik S, Gordo A, Perronnin F. Iterative quantization: A procrustean approach to learning binary codes. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2011, pp.817-824.
[41] Hjaltason G R, Samet H. Distance browsing in spatial databases. ACM Trans. Database Systems, 1999, 24(2): 265-318.
[42] Grubinger M, Clough P, Müller H, Deselaers T. The IAPR TC-12 benchmark: A new evaluation resource for visual information systems. In Proc. International Conference on Language Resources and Evaluation, May 2006, pp.13-23.
[43] Russell B C, Torralba A, Murphy K P, Freeman W T. LabelMe: A database and web-based tool for image annotation. Int. Journal of Computer Vision, 2008, 77(1/2/3): 157-173.
[44] Chua T S, Tang J, Hong R, Li H, Luo Z, Zheng T Y. NUS-WIDE: A real-world web image database from National University of Singapore. In Proc. the 8th ACM Int. Conf. Image and Video Retrieval, Jul. 2009, Article No. 48.
[1] Zhou Zhang, Pei-Quan Jin, Xiao-Liang Wang, Yan-Qi Lv, Shou-Hong Wan, Xi-Ke Xie. COLIN:一种具有高读写性能的缓存感知的动态学习索引[J]. 计算机科学技术学报, 2021, 36(4): 721-740.
[2] Yi-Ting Wang, Jie Shen, Zhi-Xu Li, Qiang Yang, An Liu, Peng-Peng Zhao, Jia-Jie Xu, Lei Zhao, Xun-Jie Yang. 基于搜索引擎丰富上下文信息的实体链接方法[J]. 计算机科学技术学报, 2020, 35(4): 724-738.
[3] Da-Wei Wang, Wan-Qiu Cui, Biao Qin. 基于标签属性图节点凝聚力的CK-modes聚类算法[J]. 计算机科学技术学报, 2019, 34(5): 1152-1166.
[4] Li-Hua Yin, Huiwen Liu. 运用语义搜索活动轨迹[J]. 计算机科学技术学报, 2019, 34(4): 775-794.
[5] Fateh Boucenna, Omar Nouali, Samir Kechid, M. Tahar Kechadi. 用户访问权限管理加密云数据的安全反向索引搜索[J]. 计算机科学技术学报, 2019, 34(1): 133-154.
[6] Ji-Zhou Luo, Sheng-Fei Shi, Guang Yang, Hong-Zhi Wang, Jian-Zhong Li. O2iJoin:一种基于索引的重叠区间高效连接算法[J]. 计算机科学技术学报, 2018, 33(5): 1023-1038.
[7] Chen Feng, Chun-Dian Li, Rui Li. 分布式顺序表的索引技术:索引和分析[J]. , 2018, 33(1): 169-189.
[8] Juan-Juan Zhao, Ling Pan, Peng-Fei Zhao, Xiao-Xian Tang. 基于语义特征和有监督哈希的图像检索的肺结节医学征象识别[J]. , 2017, 32(3): 457-469.
[9] Rui Zhu, Bin Wang, Shi-Ying Luo, Xiao-Chun Yang, Guo-Ren Wang. 基于滑动窗口的近似top-k连续查询算法[J]. , 2017, 32(1): 93-109.
[10] Tak-Lam Wong. 通过分层标记图式回答增量更新图的可达性查询[J]. , 2016, 31(2): 381-399.
[11] Camelia Constantin, Céedric du Mouza, Witold Litwin, Philippe Rigaux, Thoma. AS-Index:一种使用n-Grams和代数签名的串搜索结构[J]. , 2016, 31(1): 147-166.
[12] Yu-Rong Cheng, Ye Yuan, Lei Chen, Guo-Ren Wang. 大规模相关不确定图上基于阈值的最短路径查询[J]. , 2015, 30(4): 762-780.
[13] Wen-Gang Zhou, Hou-Qiang Li, Yijuan Lu, Qi Tian. 基于空间上下文分析的大规模部分拷贝图像检索[J]. , 2014, 29(5): 837-848.
[14] Kwangjin Park. 有效的位置依赖空间查询的数据访问方法[J]. , 2014, 29(3): 449-469.
[15] Min-Hee Jang, Sang-Wook Kim, Christos Faloutsos, and Sunju Park. 线性的地球移动距离准确近似方法[J]. , 2014, 29(1): 142-154.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: