SPS SLTC/AASP TC Webinar: Advanced Methods in Regional Speech Enhancement and Far-to-Near-Field Transformation

Date: 9 July 2025
Time: 11:00 AM ET (New York Time)
Presenter(s): Dr. Dong Yu, Dr. Meng Yu, Dr. Tong Lei

Abstract

Enhancing speech signals in complex acoustic environments remains a critical challenge in audio processing. The presenters recent work presents several innovative approaches to tackle key issues in this domain. They introduced a novel audio zooming technique based on deep learning, shifting from traditional direction-based beamforming to a user-defined, adjustable 3D region for sound capture. This advancement enables precise and flexible audio acquisition, supporting real-time applications such as remote conferencing, education, and live streaming. Building on the concept of region-based sound capture, they further aimed to transform the captured far-field audio into near-field quality. Leveraging real-world acoustic data, they proposed a novel framework that integrates a Schrödinger Bridge-based diffusion model with generative adversarial networks. This approach achieved state-of-the-art performance in reducing noise and reverberation, significantly improving speech quality. The methodology establishes a new benchmark for real-world far-field to near-field enhancement, providing interpretable insights into model behavior and the spectral recovery characteristics of generative versus predictive approaches. These advancements provide effective solutions to long-standing challenges in speech processing, enabling high-quality audio experiences in diverse applications.

Biography

Nobutako ItoDong Yu (M’97-SM’06-F’18) received the B.S. degree (with honors) in electrical engineering from Zhejiang University, China, the M.S. degree in computer science from Indiana University at Bloomington, IN USA, the M.S. degree in electrical engineering from the Chinese Academy of Sciences, China and the Ph.D. degree in computer science from the University of Idaho, Idaho, USA.

He is currently a distinguished scientist and vice general manager at Tencent AI Lab. Prior to joining Tencent in 2017, he was a principal researcher at Microsoft Research (Redmond). His research focuses on speech, natural language, and multimodal processing.

Dr. Yu has published two monographs and over 400 papers. His works have been widely cited and recognized by the IEEE Signal Processing Society best transaction paper award in 2013, 2016, 2020, and 2022, the 2021 NAACL best long paper award, the 2022 IEEE Signal Processing Magazine best paper award, the 2022 IEEE Signal Processing Magazine best column award, and the 2023 EMNLP outstanding paper award. He also has been on the editorial boards of many journals and the organizing committees of various conferences. He was the chair of the IEEE Speech and Language Processing Technical Committee between 2021-2022 and the technical co-chair of ICASSP-2021.

 

Meng YuMeng Yu received the B.S. degree in mathematics from Peking University, Beijing, China in 2007, and the Ph.D. degree in mathematics from University of California, Irvine, CA, USA in 2012.

He is currently a Principal Research Scientist at Tencent AI Lab since 2016. From 2013 to 2016, he  worked as a Staff Research Engineer at Audience (a Knowles Company), focusing on audio/speech enhancement for voice communication and  improving speech recognition. Prior to that, he was a Software Engineer at  Cisco from 2012 to 2013, specializing in speaker segmentation and recognition. His research interests focus on audio and speech processing, with a particular emphasis on single and multi-channel speech enhancement, dereverberation, echo cancellation, howling suppression, and far-field frontend speech enhancement applications.

 

Tong LeiTong Lei received the B.Sc. degree in physics from Nanjing University, Nanjing, China in 2020. She is currently pursuing her Ph.D. degree at the Key Laboratory of Modern Acoustics, Nanjing University.

She is currently interning at Tencent AI Lab and will join as a full-time employee in July 2025 after obtaining Ph.D. degree. Her research interests include speech enhancement, microphone array signal processing, and diffusion models.