SPS SLTC/AASP TC Webinar: Enhancing Speech Quality: Modern Techniques in Dereverberation
Date: 14 November 2024
Time: 7:30 AM ET (New York Time)
Presenter(s): Dr. Tomohiro Nakatani
Abstract
A speech signal captured in an enclosed space, such as a conference room, will inevitably contain reverberant components due to reflections from the walls, floor, and ceiling. These components degrade the perceived quality of the speech signal and cause issues in applications like hands-free teleconferencing and automatic speech recognition (ASR). The goal of “dereverberation” is to reduce these components while preserving the direct signal, thereby minimizing detrimental effects.
Until the early 2000s, dereverberation was considered a very difficult problem, often likened to the Holy Grail due to its fundamental importance. However, recent advancements in signal processing and machine learning have made it a solvable problem. This webinar will provide an overview of how this problem has been addressed to date and what challenges remain.
The presenter will start by addressing the fundamental challenges of dereverberation, followed by a focus on two effective approaches. The first approach utilizes a microphone array signal processing technique based on multi-channel linear prediction, known as the Weighted Prediction Error (WPE) method. WPE can estimate the inverse filter to cancel the effects of room impulse responses without prior knowledge of the recording conditions, making it a versatile preprocessing tool for various speech applications. The second approach involves Neural Networks (NNs). This webinar will demonstrate how effectively an emerging NN technique, diffusion model-based speech enhancement, can solve the problem of joint denoising and dereverberation, especially when combined with WPE. Attendees will learn effectiveness of these techniques to solve real-world problems and understand the upcoming challenges in this field.
Biography
Tomohiro Nakatani (Fellow, IEEE) received the B.E., M.E., and Ph.D. degrees from Kyoto University, Kyoto, Japan, in 1989, 1991, and 2002, respectively.
He is currently a Senior Distinguished Researcher at Communication Science Laboratories, NTT Corporation, Japan. In 2005, he was a Visiting Scholar at the Georgia Institute of Technology, USA, and from 2008 to 2017, he served as a Visiting Associate Professor in the Department of Media Science at Nagoya University, Japan. Since joining NTT Corporation as a Researcher in 1991, he has focused on developing audio signal processing technologies for intelligent human-machine interfaces, including dereverberation, denoising, source separation, and robust automatic speech recognition (ASR).
Dr. Nakatani served as an Associate Editor for the IEEE Transactions on Audio, Speech, and Language Processing from 2008 to 2010. He was a member of the IEEE SPS Audio and Acoustics Technical Committee from 2009 to 2014, the IEEE SPS Speech and Language Processing Technical Committee from 2016 to 2021, and the IEEE SPS Fellow Evaluation Committee in 2024. He co-Chaired the 2014 REVERB Challenge Workshop and was a General Co-Chair of IEEE ASRU 2017. His accolades include the 2005 IEICE Best Paper Award, the 2009 ASJ Technical Development Award, the 2012 Japan Audio Society Award, an Honorable Mention for the 2015 IEEE ASRU Best Paper Award, the 2017 Maejima Hisoka Award, and the 2018 IWAENC Best Paper Award.