[期刊论文][Full-length article]


Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

作   者:
Sidheswar Routray;Qirong Mao;

出版年:2022

页    码:101270 - 101270
出版社:Elsevier BV


摘   要:

We propose PSMGAN, an efficient phase sensitive masking-based single-channel speech enhancement technique using a conditional generative adversarial network (cGAN). The time–frequency (T-F) masking-based speech enhancement approaches through deep neural networks (DNNs) have shown large speech intelligibility improvements. However, these approaches fail to achieve better enhancement results at low signal-to-noise ratio (SNR) conditions since they ignore the phase information during reconstruction. Alternatively, GANs have been introduced effectively for speech enhancement and achieved improved performance due to the adversarial training. Motivated by the recent success of GAN, we introduce the phase sensitive masking (PSM) in a cGAN framework for speech enhancement task. The reason for choosing a conditional generative model is that the data generation process can be controlled with the use of additional temporal context information. In addition, we use gradient penalty regularization in the discriminator of the cGAN network to avoid vanishing gradients problem which in turn stabilizes the training of the cGAN network and increases the quality of the generated samples. The use of PSM is due to the fact that it involves both amplitude and phase information and produces an improved estimate of clean speech signal with higher SNR as compared to other T-F masks. Experimental results show the proposed PSM based cGAN architecture has shown significant improvements in performance measures compared to other baselines such as SEGAN, Deep Feature Loss, MetricGAN, AECNN, DNN-cIRM, and end-to-end approach with reference to quality and intelligibility.



关键字:

Single channel speech enhancement ; Phase sensitive mask ; Deep learning ; Conditional generative adversarial network (cGAN) ; Adversarial training


所属期刊
Computer Speech & Language
ISSN: 0885-2308
来自:Elsevier BV