SPS SLTC/AASP Webinar: Audio Signal Processing with Microphone Arrays: Advances and Emerging Challenges

About this topic:

Multi-microphone noise reduction and speaker separation are among the most extensively researched topics in audio signal processing. Traditionally, spatial filters—often referred to as beamformers—have been designed to satisfy specific optimization criteria, such as Minimum Variance Distortionless Response (MVDR), Speech Distortion Weighted Multichannel Wiener Filter (SDW-MWF), and Linearly Constrained Minimum Variance (LCMV). Incorporating acoustic propagation through the use of Relative Transfer Functions (RTFs) for each speaker of interest has proven beneficial. Criteria that leverage speech non-Gaussianity and sparsity have also been proposed, typically within the framework of independent component analysis. Significant efforts are dedicated to estimating the above beamformers' essential building blocks.

The multi-microphone audio processing community enthusiastically embraced deep learning. Approaches based on Deep Neural Networks (DNNs) can be broadly categorized into Traditional beamformers controlled by DNNs, Fully DNN-based solutions without explicit beamformer formulations, and Hybrid methods in which selected components—such as weights or building blocks—are learned by the DNN while preserving the underlying beamformer criteria.

This webinar will begin with a consolidated overview of beamforming in speech enhancement, emphasizing the role of acoustic propagation. The presenter will then explore the three categories of DNN-based multi-microphone approaches with examples of recent algorithms and conclude by discussing emerging challenges, including robustness, generalizability, and explainability.

About the presenter:

Sharon Gannot (SM’93–F’21) received the B.Sc. degree from Technion-Israel Institute of Technology, Haifa, Israel in 1986, the M. Sc. and Ph.D degrees from Tel-Aviv University, Tel Aviv, Israel, in 1995 and 2000, respectively, all in electrical engineering. In 2001, he held a postdoctoral position at KU Leuven, Leuven, Belgium.

He currently heads the Data Science Program and the joint Electrical Engineering and Music Program at the Faculty of Engineering at Bar-Ilan University, Ramat-Gan, Israel since 2003. He also serves as the Faculty Vice Dean. From 2018 to 2019, he was a part-time Professor at the Technical Faculty of IT and Design, Aalborg University, Denmark. He has co-authored 340 publications in audio processing, including the review paper:  “A Consolidated Perspective on Multi-Microphone Speech Enhancement and Source Separation,” S. Gannot, E. Vincent, S. Markovich-Golan, and A. Ozerov, IEEE/ACM Transactions on Audio, Speech, and Language Processing, (invited tutorial paper), vol. 25, no. 4, pp. 692–730, Apr. 2017; and the book “Audio Source Separation and Speech Enhancement,” Wiley, 2018, E. Vincent, T. Virtanen, and S. Gannot (Eds.). His research fields include machine learning and deep learning methods for speech enhancement, using single- and multi-microphone processing, speaker localization and tracking, distributed algorithms for wireless ad hoc microphone networks, and multi-modal audio processing.

Dr. Gannot is a member of the International Speech Communication Association (ISCA), the European Association for Signal Processing (EURASIP), and the European Acoustics Association (EAA), and he is a Fellow of the Asia-Pacific Artificial Intelligence Association. He is a Senior Area Chair for IEEE Transactions on Audio, Speech, and Language Processing. Additionally, he is a member of the Senior Editorial Board of IEEE Signal Processing Magazine. He was the chair of the Audio and Acoustic Signal Processing (AASP) technical committee of the IEEE Signal Processing Society (SPS) from 2017 to 2018. Since 2022, he has been the chair of the Data Science Initiative of IEEE SPS. He has been an AE of the Education Center since 2022. He was the General Co-Chair for IWAENC 2010, the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2013, and Interspeech 2024.  He is an IEEE Fellow for contributions to acoustical modelling and statistical learning in speech enhancement and a recipient of the EURASIP Group Technical Achievement Award in 2022.