Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

JSTSP Volume 13 Issue 4

Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

JSTSP Featured Articles

By:

Soumitro Chakrabarty; Emanuël A. P. Habets

This paper presents a time-frequency masking based online multi-channel speech enhancement approach that uses a convolutional recurrent neural network to estimate the mask. The magnitude and phase components of the short-time Fourier transform coefficients for multiple time frames are provided as an input such that the network is able to discriminate between the directional speech and the noise components based on the spatial characteristics of the individual signals as well as their spectro-temporal structure. The estimation of two different masks, namely, ideal ratio mask (IRM) and ideal binary mask (IBM), along with two different approaches for incorporating the mask to obtain the desired signal are discussed. In the first approach, the mask is directly applied as a real valued gain to a reference microphone signal, whereas in the second approach, the masks are used as an activity indicator for the recursive update of power spectral density (PSD) matrices to be used within a beamformer. The performance of the proposed system with the two different estimated masks utilized within the two different enhancement approaches is evaluated with both simulated as well as measured room impulse responses, where it is shown that the IBM is better suited as an indicator for the PSD updates while direct application of IRM as a real valued gain leads to a better improvement in terms of short term objective intelligibility. Analysis of the performance of the proposed system also demonstrates the robustness of the system to different angular positions of the speech source.

Read on IEEE Xplore

Tags:

IEEE JSTSP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

conferencehighlight01-3596881-small.gif

ICASSP@50: A Recap [Conference Highlights]

world-population-day-creative-concept-large-and-2025-01-07-23-25-37-utc.jpg

Building Bridges for Our Professional Future [President’s Message]

CAI_2027_Call_for_Proposals.png

Conversational Agents in the Era of Large Language Models

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

Journal of Selected Topics in Signal Processing

Publications & Resources

For Authors

IEEE SPS Conference Call for Proposals

spotlight_general.jpg

cf .png

Top Reasons to Join SPS Today!

Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

SPS Social Media

IEEE SPS Educational Resources

conferencehighlight01-3596881-small.gif

ICASSP@50: A Recap [Conference Highlights]

world-population-day-creative-concept-large-and-2025-01-07-23-25-37-utc.jpg

Building Bridges for Our Professional Future [President’s Message]

CAI_2027_Call_for_Proposals.png

Conversational Agents in the Era of Large Language Models

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

Search form

You are here

Journal of Selected Topics in Signal Processing

Publications & Resources

For Authors

IEEE SPS Conference Call for Proposals

spotlight_general.jpg

cf .png

Top Reasons to Join SPS Today!

Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

SPS Social Media

IEEE SPS Educational Resources