We use cookies to improve your experience with our site.

用于图像-文本检索的多任务视觉语义嵌入网络

Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval

  • 摘要:
    研究背景 图像-文本检索旨在捕获图像与文本之间的语义相关性,其对跨越视觉语言之间的语义鸿沟具有重要作用,同时是多模态推荐、搜索系统以及在线购物的基础和关键组成部分。现有的主流方法主要侧重于对图像和文本的关联性进行建模,而忽略了多任务视觉语义约束对图像文本检索的有利影响。
    目的 本文主要研究目的是通过对图像和文本的语义约束,来保证两者在整个网络训练过程中始终保持语义的一致性;进一步地,利用多任务联合训练学习泛化性和鲁棒性更强的图像和文本特征表示,以此提升跨模态图像文本检索的性能。
    方法 本文提出了一个多任务视觉语义嵌入网络MVSEN,通过设计两个用于语义约束的辅助任务(文本-文本匹配和多标签分类)从训练的角度来提升视觉语义嵌入的泛化性和鲁棒性,同时设计了一个模态交互模块以发现图像和文本潜在的对齐关系。
    结果 实验结果表明本文提出的MVSEN方法在两个公开数据集(Flickr30K和MSCOCO)上取得了先进的实验结果,和当前最先进的方法相比,在评价指标rSum上分别提升了8.2%和3.0%。
    结论 本文提出的MVSEN方法在跨模态图像文本检索上取得了不错的实验性能;另外,实验结果表明文本-文本匹配和多标签分类任务都有助于提升图像文本检索的性能。进一步地,这两个辅助任务可以整合于现有的跨模态图像文本检索方法中,这对多任务学习在跨模态检索上的应用具有重要意义。

     

    Abstract: Image-text retrieval aims to capture the semantic correspondence between images and texts, which serves as a foundation and crucial component in multi-modal recommendations, search systems, and online shopping. Existing mainstream methods primarily focus on modeling the association of image-text pairs while neglecting the advantageous impact of multi-task learning on image-text retrieval. To this end, a multi-task visual semantic embedding network (MVSEN) is proposed for image-text retrieval. Specifically, we design two auxiliary tasks, including text-text matching and multi-label classification, for semantic constraints to improve the generalization and robustness of visual semantic embedding from a training perspective. Besides, we present an intra- and inter-modality interaction scheme to learn discriminative visual and textual feature representations by facilitating information flow within and between modalities. Subsequently, we utilize multi-layer graph convolutional networks in a cascading manner to infer the correlation of image-text pairs. Experimental results show that MVSEN outperforms state-of-the-art methods on two publicly available datasets, Flickr30K and MSCOCO, with rSum improvements of 8.2% and 3.0%, respectively.

     

/

返回文章
返回