? Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | FAQ
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (4) :738-748    DOI: 10.1007/s11390-017-1755-6
Special Issue on Deep Learning Current Issue | Archive | Adv Search << Previous Articles | Next Articles >>
Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering
Ai-Wen Jiang1, Member, CCF, Bo Liu2, Ming-Wen Wang1,*, Senior Member, CCF
1 College of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China;
2 College of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, U.S.A

Abstract
Reference
Related Articles
Download: [PDF 1924KB]     Export: BibTeX or EndNote (RIS)  
Abstract Image question answering (IQA) has emerged as a promising interdisciplinary topic in computer vision and natural language processing fields. In this paper, we propose a contextually guided recurrent attention model for solving the IQA issues. It is a deep reinforcement learning based multimodal recurrent neural network. Based on compositional contextual information, it recurrently decides where to look using reinforcement learning strategy. Different from traditional "static" soft attention, it is deemed as a kind of "dynamic" attention whose objective is designed based on reinforcement rewards purposefully towards IQA. The finally learned compositional information incorporates both global context and local informative details, which is demonstrated to benefit for generating answers. The proposed method is compared with several state-of-the-art methods on two public IQA datasets, including COCO-QA and VQA from dataset MS COCO. The experimental results demonstrate that our proposed model outperforms those methods and achieves better performance.
Articles by authors
Keywordsimage question answering   recurrent attention   deep reinforcement learning   multimodal recurrent neural network   multimodal fusion     
Received 2016-12-19;
Fund:

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61365002 and 61462045, and the Science and Technology Project of the Education Department of Jiangxi Province of China under Grant No. GJJ150350.

Corresponding Authors: Ming-Wen Wang     Email: mwwang@jxnu.edu.cn
About author:
Cite this article:   
Ai-Wen Jiang, Bo Liu, Ming-Wen Wang.Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering[J]  Journal of Computer Science and Technology, 2017,V32(4): 738-748
URL:  
http://jcst.ict.ac.cn:8080/jcst/EN/10.1007/s11390-017-1755-6
Copyright 2010 by Journal of Computer Science and Technology