1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Speech-enabled systems often experience performance degradation in real-world scenarios, primarily due to adverse acoustic conditions and interactions among multiple speakers. Enhancing the front-end speech processing technology is vital for improving the performance of the back-end systems. However, most existing front-end techniques are solely based on the audio modality and have reached performance plateaus. Building upon the observation that visual cues can aid human speech perception, the focus of Multimodal Information Based Speech Processing (MISP) 2023 Challenge is on the Audio-Visual Target Speaker Extraction (AVTSE) problem, which aims to extract the target speaker’s speech from mixtures containing various speakers and background noise. MISP 2023 challenge focuses explicitly on the problem under a real scenario with a complex acoustic environment. It provides a benchmark dataset collected from home TV environments, reflecting the challenges of such settings. In addition, to explore the impact of AVTSE on the back-end task, we use a pre-trained speech recognition model to evaluate the performance of the AVTSE.
© Copyright 2024 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.