Deep-learning-based audio-visual speech enhancement
We all experienced the discomfort of communicating with our friends at a cocktail party or in a pub with loud background music. When difficult acoustic scenarios like these occur, we tend to rely on several visual cues, such as lips and mouth movement of the speaker, in order to understand the speech of interest.