[期刊论文][Full-length article]


Mood-aware visual question answering

作   者:
Nelson Ruwa;Qirong Mao;Liangjun Wang;Jianping Gou;Ming Dong;

出版年:2019

页     码:305 - 316
出版社:Elsevier BV


摘   要:

The concept of Visual Question Answering (VQA) has recently attracted the attention of many researchers in the field of machine learning. Different attention models have been proposed in VQA for the purpose of addressing the need to focus on local regions of an image. This paper proposes the concept of Mood-Aware Visual Question Answering (MAVQA) using novel long short term memory (LSTM) and convolutional neural network (CNN) attention models that combine the local image features, the question and the mood detected from the particular region of the image to produce a mood-based answer using a pre-processed image dataset. The attention mechanisms serve to enable the VQA model to only focus on parts of the image that are relevant to both the detected mood and the key words in the question. The irrelevant parts of the image are ignored, thus improving classification accuracy by reducing the chances of predicting wrong answers. Whereas previous efforts have utilized CNN mostly for the embedding of images and text, we formulate a CNN attention algorithm for the image, question and mood. The more direct convolutional attention operation is more efficient and effective, when the number of views and kernel length are optimized, than the winding recurrent LSTM attention operation. The experimental results prove that MAVQA is effectively mood-aware, and the accuracy levels of our LSTM attention model are well within the range of previous conventional VQA benchmarks, while our novel CNN attention model outperforms the previous baselines in several instances. The additional attention on the mood does not only improve classification accuracy but also substantially contributes towards the analysis and comprehension of image features, a key development in modern artificial intelligence.



关键字:

Mood-aware ; Visual question answering ; Attention model ; Long short term memory ; Convolutional neural network


所属期刊
Neurocomputing
ISSN: 0925-2312
来自:Elsevier BV