[期刊论文][Full-length article]


Affective question answering on video

作   者:
Nelson Ruwa;Qirong Mao;Liangjun Wang;Jianping Gou;

出版年:2019

页     码:125 - 139
出版社:Elsevier BV


摘   要:

Visual Question Answering (VQA) is an increasingly popular research area in machine learning. Most of the existing VQA tasks only focus on static images, and only a few models are based on videos. The primary purpose of this project is to develop an innovative model that performs Affective Question Answering on Video (AQAV), a multi-tasking architecture that implements a Video QA route and an Affective route. A pre-trained CNN emotion detector recognizes emotions on the frames of a video, and a string of the emotion labels is relayed to the Token-based, Frame-based and Integrated attention mechanisms. The attention model uses the visual features, the question and the emotion labels to focus on relevant frames of the video and relevant regions of the frames. The string of emotion labels is used to generate an emotion caption that will be used by the Text QA module to prepare an affective answer. A conventional answer is generated from processes that take place along the Video QA route, while the affective answer is a product of both the Video QA and the Affective routes. Our model does not only make VQA more analytic by generating an explanatory answer, but also registers quantitative improvement in performance, when compared with previous baselines. We managed to prove that the injection of emotions in the attention mechanism boosts VQA performance. The AQAV model contributes towards efforts in making machines understand sequential and dynamic visual scenes in the real world.



关键字:

Video question answering ; Emotion detection ; Video captioning ; Multi-task learning


所属期刊
Neurocomputing
ISSN: 0925-2312
来自:Elsevier BV