? 基于上下文引导型循环注意机制与深度多模态强化网络的图像问答算法
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (4) :738-748    DOI: 10.1007/s11390-017-1755-6
Special Issue on Deep Learning << Previous Articles | Next Articles >>
基于上下文引导型循环注意机制与深度多模态强化网络的图像问答算法
Ai-Wen Jiang1, Member, CCF, Bo Liu2, Ming-Wen Wang1,*, Senior Member, CCF
1 College of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China;
2 College of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, U.S.A
Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering
Ai-Wen Jiang1, Member, CCF, Bo Liu2, Ming-Wen Wang1,*, Senior Member, CCF
1 College of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China;
2 College of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, U.S.A

摘要
参考文献
相关文章
Download: [PDF 1924KB]  
摘要 图像问答算法是计算机视觉与自然语言处理的交叉研究领域。本文提出上下文引导型的循环注意机制,基于深度强化学习的多模态循环网络用于解决图像问答问题。算法根据综合的上下文信息,采用强化学习策略循环地决定在何处寻找问题相关的视觉内容。与传统“静态”的软注意机制,本文的注意机制可被认为是一种动态的注意机制。它根据问答相关的强化学习奖励信号优化学习目标。最终学习到的综合信息包含了全局上下文和局部细节,能够更好地用于问题答案的生成。我们在两个公开数据集COCOQA和MSCOCO-VQA上与当前主流算法模型进行了比较。实验结果表明,本文提出的模型能够取得更好的性能,具有良好的优越性。
关键词图像问答   循环注意   深度强化学习   多模态循环神经网络   多模态融合     
Abstract: Image question answering (IQA) has emerged as a promising interdisciplinary topic in computer vision and natural language processing fields. In this paper, we propose a contextually guided recurrent attention model for solving the IQA issues. It is a deep reinforcement learning based multimodal recurrent neural network. Based on compositional contextual information, it recurrently decides where to look using reinforcement learning strategy. Different from traditional "static" soft attention, it is deemed as a kind of "dynamic" attention whose objective is designed based on reinforcement rewards purposefully towards IQA. The finally learned compositional information incorporates both global context and local informative details, which is demonstrated to benefit for generating answers. The proposed method is compared with several state-of-the-art methods on two public IQA datasets, including COCO-QA and VQA from dataset MS COCO. The experimental results demonstrate that our proposed model outperforms those methods and achieves better performance.
Keywordsimage question answering   recurrent attention   deep reinforcement learning   multimodal recurrent neural network   multimodal fusion     
Received 2016-12-19;
本文基金:

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61365002 and 61462045, and the Science and Technology Project of the Education Department of Jiangxi Province of China under Grant No. GJJ150350.

通讯作者: Ming-Wen Wang     Email: mwwang@jxnu.edu.cn
About author: Ai-Wen Jiang received his Ph.D. degree in pattern recognition and intelligent system from the Institute of Automation, Chinese Academy of Sciences, Beijing, in 2010. Currently, he is an associate professor at Jiangxi Normal University, Nanchang. His research interests include vision and language, deep learning, and crossmodal retrieval.
引用本文:   
Ai-Wen Jiang, Bo Liu, Ming-Wen Wang.基于上下文引导型循环注意机制与深度多模态强化网络的图像问答算法[J]  Journal of Computer Science and Technology , 2017,V32(4): 738-748
Ai-Wen Jiang, Bo Liu, Ming-Wen Wang.Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering[J]  Journal of Computer Science and Technology, 2017,V32(4): 738-748
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1755-6
Copyright 2010 by Journal of Computer Science and Technology