Skip to main content

Education Center

Advances and Challenges in Audio-Visual Sound Source Localization (video)

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
Audio-visual machine learning models are transforming video understanding by effectively integrating auditory and visual data, enabling diverse applications such as audio-visual scene analysis. A prominent example within this field is visual sound source localization, which identifies the spatial location of sounds in visual scenes, applicable to both environmental audio and speech. Recent methods leverage various intra and inter-modality model design choices, training-invariance strategies and data augmentations, leading to significant performance improvements. However, despite these advancements, current models often exhibit biases towards visual information, making it unclear how much each modality contributes to the final decision. Furthermore, existing task definitions typically prioritize visual perception, prevalent benchmarks can be noisy or unreliable, and standard evaluation metrics only partially capture the models' true performance. This webinar will discuss recent advancements in visual sound source localization, critically assess existing methodologies, present prevailing challenges, and highlight potential pathways for future research in addressing these issues.
Duration
1:09:00
Subtitles

IEEE SPS Education Center FAQs

The IEEE SPS Education Center is your hub for educational resources in signal processing. It offers a variety of materials tailored for students and professionals alike. You can explore content based on your specific interests and skill levels.

Select the program and click on the external link to the IEEE SPS Resource Center.

Educational credits in the form of professional development hours (PDHs) or continuing education units (CEUs) are available on select educational programs.