SPS SLTC/AASP Webinar: Advances and Challenges in Audio-Visual Sound Source Localization
About this topic:
Audio-visual machine learning models are transforming video understanding by effectively integrating auditory and visual data, enabling diverse applications such as audio-visual scene analysis. A prominent example within this field is visual sound source localization, which identifies the spatial location of sounds in visual scenes, applicable to both environmental audio and speech. Recent methods leverage various intra and inter-modality model design choices, training-invariance strategies and data augmentations, leading to significant performance improvements. However, despite these advancements, current models often exhibit biases towards visual information, making it unclear how much each modality contributes to the final decision. Furthermore, existing task definitions typically prioritize visual perception, prevalent benchmarks can be noisy or unreliable, and standard evaluation metrics only partially capture the models' true performance. This webinar will discuss recent advancements in visual sound source localization, critically assess existing methodologies, present prevailing challenges, and highlight potential pathways for future research in addressing these issues.
About the presenter:
Magdalena Fuentes received the B.Eng. degree in electrical engineering from Universidad de la República, Montevideo, Uruguay in 2015 and the Ph.D. degree in signal and image processing from Université Paris-Saclay, France in 2019, conducting research at the ADASP group at Télécom Paris.
She is currently an Assistant Professor of Music Technology and Integrated Design & Media at New York University (NYU), serving as core faculty at the Music and Audio Research Laboratory (MARL) and affiliated faculty with the Computer Science and Engineering Department at Tandon School of Engineering since 2022. She was a Provost’s Postdoctoral Fellow at MARL and the Center for Urban Science and Progress (CUSP) at NYU. Her research interests include multimodal machine learning, music information retrieval, and environmental sound analysis.
Dr. Fuentes is a member of the IEEE Audio and Acoustic Signal Processing Technical Committee (2021–2026) and has served as Program Chair for ISMIR 2025 and the DCASE Workshop (2021, 2023, 2025), Area Chair for ICASSP (2022–2025), and Diversity Chair for WASPAA 2023. Her awards include funding from NYU, Google, and the National Institutes of Health (NIH).