The Multimodal Information Based Speech Processing (MISP) 2022 Challenge aims to extend the application of signal processing technology in specific scenarios, using audio and video data. We target the home TV scenario, where 2-6 people communicate with each other with TV noise in the background. Our new tracks focus on audio-visual speaker diarization (AVSD), and audio-visual diarization and recognition (AVDR).
Visit the Challenge website for details and more information!