中国教育图书进出口有限公司

[期刊论文][Full-length article]

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

作者：	Sidheswar Routray;Qirong Mao;

出版年：2022

页码：101270 - 101270

出版社：Elsevier BV

摘要：

We propose PSMGAN, an efficient phase sensitive masking-based single-channel speech enhancement technique using a conditional generative adversarial network (cGAN). The time–frequency (T-F) masking-based speech enhancement approaches through deep neural networks (DNNs) have shown large speech intelligibility improvements. However, these approaches fail to achieve better enhancement results at low signal-to-noise ratio (SNR) conditions since they ignore the phase information during reconstruction. Alternatively, GANs have been introduced effectively for speech enhancement and achieved improved performance due to the adversarial training. Motivated by the recent success of GAN, we introduce the phase sensitive masking (PSM) in a cGAN framework for speech enhancement task. The reason for choosing a conditional generative model is that the data generation process can be controlled with the use of additional temporal context information. In addition, we use gradient penalty regularization in the discriminator of the cGAN network to avoid vanishing gradients problem which in turn stabilizes the training of the cGAN network and increases the quality of the generated samples. The use of PSM is due to the fact that it involves both amplitude and phase information and produces an improved estimate of clean speech signal with higher SNR as compared to other T-F masks. Experimental results show the proposed PSM based cGAN architecture has shown significant improvements in performance measures compared to other baselines such as SEGAN, Deep Feature Loss, MetricGAN, AECNN, DNN-cIRM, and end-to-end approach with reference to quality and intelligibility.

关键字：

Single channel speech enhancement ; Phase sensitive mask ; Deep learning ; Conditional generative adversarial network (cGAN) ; Adversarial training

去购买

原文链接

所属期刊

Computer Speech & Language

ISSN: 0885-2308

来自：Elsevier BV