SPS Webinar: Generative Audio Restoration in Multimodal Applications

Date: 22 April 2025
Time: 11:00 AM ET (New York Time)
Presenter(s): Mr. Julius Richter

Based on the IEEE Xplore® article: 
"Speech Enhancement and Dereverberation With Diffusion-Based Generative Models"
Published: IEEE/ACM Transactions on Audio, Speech, and Language Processing, June 2023.

Download article: Original article will be made publicly available for download on the day of the webinar for 48 hours.

Abstract

The demand for high sound quality is increasing in both entertainment and communications. Consequently, audio restoration algorithms play a critical role in mitigating distortions and interferences that originate from recording processes or arise from imperfect transmission pipelines. This webinar offers an in-depth examination of generative audio restoration algorithms, with a particular focus on diffusion-based techniques for speech enhancement. The presenter will examine how diffusion models can be effectively employed in audio restoration tasks, including methods for conditioning them on visual modalities to improve performance in challenging acoustic scenarios. Additionally, he will explore various diffusion-based approaches, such as flow matching and the Schrödinger bridge, underscoring their significance in the context of audio restoration. The goal is to offer valuable insights into the theoretical underpinnings and practical applications of these advanced techniques.

Biography

Julius Richter

Julius Richter received the B.Sc. and M.Sc. degrees in electrical engineering from the Technical University of Berlin, Germany in 2017 and 2019 respectively. He is currently a Ph.D. student in the Signal Processing group at the University of Hamburg, Germany.

His research interests include deep generative models and multimodal learning with applications to audio–visual speech processing.

Mr. Richter was the recipient of the VDE ITG Award 2024 for his work on speech enhancement with diffusion-based generative models.