[期刊论文]


Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis

作   者:
Jiamin Fu;Qirong Mao;Juanjuan Tu;Yongzhao Zhan;

出版年:2019

页     码:451 - 461
出版社:Springer Nature


摘   要:

Multimodal emotion recognition is a challenging research topic which has recently started to attract the attention of the research community. To better recognize the video users’ emotion, the research of multimodal emotion recognition based on audio and video is essential. Multimodal emotion recognition performance heavily depends on finding good shared feature representation. The good shared representation needs to consider two aspects: (1) it has the character of each modality and (2) it can balance the effect of different modalities to make the decision optimal. In the light of these, we propose a novel Enhanced Sparse Local Discriminative Canonical Correlation Analysis approach (En-SLDCCA) to learn the multimodal shared feature representation. The shared feature representation learning involves two stages. In the first stage, we pretrain the Sparse Auto-Encoder with unimodal video (or audio), so that we can obtain the hidden feature representation of video and audio separately. In the second stage, we obtain the correlation coefficients of video and audio using our En-SLDCCA approach, then we form the shared feature representation which fuses the features from video and audio using the correlation coefficients. We evaluate the performance of our method on the challenging multimodal Enterface’05 database. Experimental results reveal that our method is superior to the unimodal video (or audio) and improves significantly the performance for multimodal emotion recognition when compared with the current state of the art.



关键字:

Multimodal emotion recognition ; Multimodal shared feature learning ; Multimodal information fusion ; Canonical correlation analysis


所属期刊
Multimedia Systems
ISSN: 0942-4962
来自:Springer Nature