SPS SLTC/AASP TC Webinar: Diffusion Models for Speech Enhancement and Restoration
Date: 19 June 2024
Time: 8:30 AM ET (New York Time)
Presenter(s): Dr. Timo Gerkmann
Abstract
Today, speech communication devices for telephony, video-telephony, and assistive listening is broadly used -- by many of us daily. Often, speech communication is disturbed by acoustic or transmission artifacts. Acoustic artifacts include background noise and reverberation; transmission artifacts include, e.g., coding artifacts such as acoustic bandwidth reduction or packet loss. Speech enhancement and restoration address the reduction of acoustic and transmission artifacts. While traditionally predictive approaches have been used, more recently, generative approaches, particularly diffusion models, are gaining increasing interest. These generative approaches result in remarkable perceived speech quality given a broad range of acoustic disturbances and transmission artifacts.
In this talk, the presenter will introduce diffusion models for speech enhancement and restoration. He will start by explaining the underlying concept and then explain how powerful approaches like SGMSE+ differ from vanilla diffusion models by integrating environmental noise in the stochastic differential equation, describing the forward and backward diffusion processes. Besides the strengths of diffusion models, he will also highlight current research topics that address reducing computational complexity and hallucinations in challenging situations.
Biography
Timo Gerkmann is currently a Professor for Signal Processing with the Universität Hamburg, Hamburg, Germany. He has held positions with Technicolor Research & Innovation, University of Oldenburg, Oldenburg, Germany, KTH Royal Institute of Technology, Stockholm, Sweden, Ruhr-Universität Bochum, Bochum, Germany, and Siemens Corporate Research, Princeton, NJ, USA.
His research interests include statistical signal processing and machine learning for speech and audio applied to communication devices, hearing instruments, audio-visual media, and human-machine interfaces.
Dr. Gerkmann was the recipient of the VDE ITG award 2022. He served in the IEEE Signal Processing Society Technical Committee on Audio and Acoustic Signal Processing and is currently a Senior Area Editor of the IEEE/ACM Transactions on Audio, Speech and Language Processing.