SPS Webinar: 17 November 2022, presented by Dr. Daniel Michelsanti

You are here

Inside Signal Processing Newsletter Home Page

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

SPS Webinar: 17 November 2022, presented by Dr. Daniel Michelsanti

Upcoming SPS Webinar!

Title: Audio-visual Speech Enhancement and Separation Based on Deep Learning
Date:  17 November 2022
Time: 1:00 PM Eastern (New York time)
Duration: Approximately 1 Hour
Presenters: Dr. Daniel Michelsanti

Based on the IEEE Xplore® article: An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Published: IEEE/ACM Transactions on Audio, Speech, and Language Processing, March 2021, available in IEEE Xplore®

Download: The original article is available for download.

 

Register for the Webinar

 

Abstract:

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources. Traditionally, these tasks have been tackled using signal processing and machine learning techniques applied to the available acoustic signals. Since the visual aspect of speech is essentially unaffected by the acoustic environment, visual information from the target speakers, such as lip movements and facial expressions, has also been used for speech enhancement and speech separation systems.
 
In order to efficiently fuse acoustic and visual information, researchers have exploited the flexibility of data-driven approaches, specifically deep learning, achieving strong performance. The ceaseless proposal of a large number of techniques to extract features and fuse multimodal information has highlighted the need for an overview that comprehensively describes and discusses audio-visual speech enhancement and separation based on deep learning.
 
In this webinar, we provide a survey of this research topic. We also survey commonly employed audio-visual speech datasets, given their central role in the development of data-driven approaches, and evaluation methods, because they are generally used to compare different systems and determine their performance.

Biography:

Daniel Michelsanti

Dr. Daniel Michelsanti (S'16 M'21) received the B.Sc. degree in computer science and electronic engineering (cum laude) at the University of Perugia, Italy, and the M.Sc. degree in vision, graphics and interactive systems at Aalborg University, Denmark, in 2014 and 2017, respectively. He also received his PhD degree from Aalborg University, Denmark, in 2021, where he investigated the problem of audio-visual speech enhancement using deep learning.
 
He was then employed as a research assistant at the section for Artificial Intelligence and Sound (AIS), Department of Electronic Systems, Aalborg University, Denmark, where he conducted research on acoustic signal analysis for industry 4.0. Currently, he is an industrial postdoctoral researcher at Oticon and Aalborg University.
 
Dr. Michelsanti’s research interests are in the areas of speech enhancement, computer vision and machine learning, specifically deep learning.

 

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel