[期刊论文][Full-length article]


Triple attention network for sentimental visual question answering

作   者:
Nelson Ruwa;Qirong Mao;Heping Song;Hongjie Jia;Ming Dong;

出版年:2019

页    码:102829 - 102829
出版社:Elsevier BV


摘   要:

Visual Question Answering (VQA) and Visual Sentiment Analysis (VSA) are recently popular research fields in multimedia analysis using deep learning, but little effort has been put in attempting to close the gap between them. Better image understanding can be achieved by analyzing sentimental attributes from the different regions of an image. This paper proposes the Triple Attention Network (TANet) that attends to features of an image, a text question and a string of distinct multiple localized sentimental visual attributes, in a triple attention mechanism, in order to generate a fully affective answer. The separate experiments demonstrate how two customized image datasets can be used to train a VQA model that employs Long short-term memory (LSTM) and convolutional neural network (CNN) feature attention techniques for the text question and the sentimental attributes. The additional attention to the sentimental attributes causes the model to focus on more relevant regions of the image, which results in better image understanding and improved quality of the answer. The Hadamard product is modified to handle the three attended variables during feature fusion. The results of the experiments clearly show that high classification accuracy levels can be achieved together with a multi-attribute affective answer, and our model outperforms recent VSA and VQA baseline models. The proposed model is a significant step towards the realization of machines that can comprehend perfect natural language just like humans.



关键字:

Visual question answering ; Feature embedding ; Attention model ; Sentiment analysis


所属期刊
Computer Vision and Image Understanding
ISSN: 1077-3142
来自:Elsevier BV