Multimodal Information Based Speech Processing (MISP) 2023 Challenge: ICASSP 2024

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Multimodal Information Based Speech Processing (MISP) 2023 Challenge: ICASSP 2024


Speech-enabled systems often experience performance degradation in real-world scenarios, primarily due to adverse acoustic conditions and interactions among multiple speakers. Enhancing the front-end speech processing technology is vital for improving the performance of the back-end systems. However, most existing front-end techniques are solely based on the audio modality and have reached performance plateaus. Building upon the observation that visual cues can aid human speech perception, the focus of Multimodal Information Based Speech Processing (MISP) 2023 Challenge is on the Audio-Visual Target Speaker Extraction (AVTSE) problem, which aims to extract the target speaker’s speech from mixtures containing various speakers and background noise. MISP 2023 challenge focuses explicitly on the problem under a real scenario with a complex acoustic environment. It provides a benchmark dataset collected from home TV environments, reflecting the challenges of such settings. In addition, to explore the impact of AVTSE on the back-end task, we use a pre-trained speech recognition model to evaluate the performance of the AVTSE.

Visit the Challenge website for details and more information!


Technical Committee: Image, Video, and Multidimensional Signal Processing, Speech and Language Processing


IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel