SPS Webinar: 19 January 2023, presented by Dr. Fei Tao and Dr. Carlos Busso

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

January 2023

SPS Webinar: 19 January 2023, presented by Dr. Fei Tao and Dr. Carlos Busso

Webinar.jpg

Education & Resources

Upcoming SPS Webinar!

Title: Advances on Multimodal Machine Learning Solutions for Speech Processing Tasks and Emotion Recognition
Date: 19 January 2023
Time: 1:00 PM Eastern (New York time)
Duration: Approximately 1 Hour
Presenters: Dr. Fei Tao and Dr. Carlos Busso

Based on the IEEE Xplore® article: End-to-End Audiovisual Speech Recognition System with Multitask Learning
Published: IEEE Transactions on Multimedia, February 2020, available in IEEE Xplore®

Download: The original article is available for download.

Abstract:

Recent advances in multimodal processing have led to promising solutions for speech-processing tasks. One example is automatic speech recognition (ASR), which is a key component in current speech-based systems. Since the surrounding acoustic noise can severely degrade the performance of an ASR system, an appealing solution is to augment conventional audio-based ASR systems with visual features describing lip activity. We describe a novel end-to-end, multitask learning (MTL), audiovisual ASR (AV-ASR) system. A key novelty of the approach is the use of MTL, where the primary task is AV-ASR, and the secondary task is audiovisual voice activity detection (AV-VAD). We obtain a robust and accurate audiovisual system that generalizes across conditions. By detecting segments with speech activity, the AV-ASR performance improves as its connectionist temporal classification (CTC) loss function can leverage from the AV-VAD alignment information. Furthermore, the end-to-end system learns from the raw audiovisual inputs a discriminative high-level representation for both speech tasks, providing the flexibility to mine information directly from the data. The proposed architecture considers the temporal dynamics within and across modalities, providing an appealing and practical fusion scheme. In addition to state-of-the-art performance in AV-ASR, the proposed solution can also provide valuable information about speech activity, solving two of the most important tasks in speech-based applications. This webinar will also discuss advances of multimodal solutions for emotion recognition. We describe multimodal pre-text tasks that are carefully designed to learn better representations for predicting emotional cues from speech, leveraging the relationship between acoustic and facial features. We also discuss our current effort to design multimodal emotion recognition strategies that effectively combine auxiliary networks, a transformer architecture, and an optimized training mechanism for aligning modalities, capturing temporal information, and handling missing features. These models offer principled solutions to increase the generalization and robustness of emotion recognition systems.

Biography:

Dr. Fei Tao received the B.S. degree in electrical engineering from Beijing Jiaotong University, Beijing (BJTU), Beijing, China, in 2009, the M.S. degree from Texas Southern University (TSU), Houston, TX, USA, and the Ph.D. degree from the University of Texas at Dallas, Richardson, TX, USA, in 2018.

He is currently a Senior Applied Scientist at Amazon Web Services (AWS) leading a team developing multimodal artificial intelligence/machine learning (AIML) models. His research interest is in multimodal machine learning, which involves audio, video, and text. He has worked on speech recognition, speaker verification, multimodal emotion recognition, multimodal active speaker detection, source separation, text-to-speech synthesis, music synthesis, and multimodal advertisement recommendation.

Dr. Tao has served as reviewer for academia, including top conference, ICMI, Interspeech, ICASSP, NAACL, ACL, EMNLP, AAAI, and top journals, like IEEE Transactions on Acoustics, Speech, and Signal Processing.

Dr. Carlos Busso received the B.S. and M.S. (hons.) degrees in electrical engineering from the University of Chile, Santiago, Chile, in 2000 and 2003, respectively, and the Ph.D. degree in electrical engineering from the University of Southern California (USC), Los Angeles, CA, USA, in 2008.

He is a Professor at the University of Texas at Dallas’s Electrical and Computer Engineering Department, where he is also the director of the Multimodal Signal Processing (MSP) Laboratory. His research interest is in human-centered multimodal machine intelligence and application, with focus on the broad areas of affective computing, multimodal human-machine interfaces, in-vehicle active safety system, and machine learning methods for multimodal processing. He has worked on audio-visual emotion recognition, analysis of emotional modulation in gestures and speech, designing realistic human-like virtual characters, and detection of driver distractions. His research group has received research funding from different agencies including the National Science Foundation (NSF), National Institutes of Health (NIH), Biometric Center of Excellence (BCOE), Semiconductor Research Corporation (SRC), and grants from industry (Samsung, Robert Bosch LLC, Microsoft, Honda Research Institute).

Dr. Busso is a recipient of an NSF CAREER Award. In 2014, he received the ICMI Ten-Year Technical Impact Award. In 2015, his student received the third prize IEEE ITSS Best Dissertation Award (N. Li). He also received the Hewlett Packard Best Paper Award at the IEEE ICME 2011 (with J. Jain), and the Best Paper Award at the AAAC ACII 2017 (with Yannakakis and Cowie). He received the Best of IEEE Transactions on Affective Computing Paper Collection in 2021 (with R. Lotfian) and the Best Paper Award from IEEE Transactions on Affective Computing in 2022 (with Yannakakis and Cowie). He is the co-author of the winning paper of the Classifier Sub-Challenge event at the Interspeech 2009 emotion challenge. He has served in chair positions for top conferences of the field including International conference on multimodal Interaction (ICMI), Interspeech, the IEEE International Conference on Multimedia & Expo (ICME), IEEE International Conference on Automatic Face and Gesture Recognition (FG), and AAAC Conference on Affective Computing and Intelligent Interaction. He has served as the general chair of ACII 2017 and ICMI 2021, and Program Chair of ICMI 2016, VCIP2017 and ASRU 2021. He is currently serving as an associate editor of the IEEE Transactions on Affective Computing. He is an IEEE Fellow. He is a member of the International Speech Communication Association (ISCA), and the Association for the Advancement of Affective Computing (AAAC), and a senior member of Association for Computing Machinery (ACM).

Tags:

SPS Webinars

SPS Newsletter Article

Open Calls

Nomination/Position	Deadline
Call for Nominations for the IEEE Transactions on Medical Imaging (TMI) Best Paper Award	1 September 2025
Nominate a Colleague for a 2025 IEEE Signal Processing Society Award	1 September 2025
Call for Mentors: 2025 IEEE SPS SigMA Program - Signal Processing Mentorship Academy	14 September 2025
Call for Nominations: Technical Committee Vice Chair and Member Positions	15 September 2025
Call for Papers for ICASSP 2026 Now Open!	17 September 2025
Call for Nominations: Awards Board, Industry Board and Nominations & Elections Committee	19 September 2025
Call for proposals: 2027 IEEE Conference on Artificial Intelligence (CAI)	1 October 2025
Take Part in the 2025 Low-Resource Audio Codec (LRAC) Challenge	1 October 2025
Meet the 2025 Candidates: IEEE President-Elect	1 October 2025
Call for Nominations for the SPS Chapter of the Year Award	15 October 2025
Submit a Proposal for ICASSP 2030	31 October 2025
Call for Project Proposals: IEEE SPS SigMA Program - Signal Processing Mentorship Academy	2 November 2025

Education & Resources

Society News

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

webinar_cube.jpg

SPS BSI Webinar: NeuroAI: From HoloBrain to HoloGraph

close-up-of-fiber-optic-cables-2024-11-03-07-51-25-utc.jpg

Waveforms for Computing Over the Air: A groundbreaking approach that redefines data aggregation

book-background-old-books-in-the-library-bookshe-2025-03-10-11-04-10-utc.jpg

Ode to Masterfully Written Textbooks: And remembering Simon Haykin [From the Editor]

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

SPS Webinar: 19 January 2023, presented by Dr. Fei Tao and Dr. Carlos Busso

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

SPS Webinar: 19 January 2023, presented by Dr. Fei Tao and Dr. Carlos Busso

Webinar.jpg

Upcoming SPS Webinar!

Abstract:

Biography:

Open Calls

Table of Contents:

Education & Resources

Society News

Publications News

Technical Committee News

SPS Social Media

IEEE SPS Educational Resources

webinar_cube.jpg

SPS BSI Webinar: NeuroAI: From HoloBrain to HoloGraph

close-up-of-fiber-optic-cables-2024-11-03-07-51-25-utc.jpg

Waveforms for Computing Over the Air: A groundbreaking approach that redefines data aggregation

book-background-old-books-in-the-library-bookshe-2025-03-10-11-04-10-utc.jpg

Ode to Masterfully Written Textbooks: And remembering Simon Haykin [From the Editor]

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

SPS Webinar: 19 January 2023, presented by Dr. Fei Tao and Dr. Carlos Busso

Search form

You are here

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

SPS Webinar: 19 January 2023, presented by Dr. Fei Tao and Dr. Carlos Busso

Webinar.jpg

Upcoming SPS Webinar!

Abstract:

Biography:

Open Calls

Table of Contents:

Education & Resources

Society News

Publications News

Technical Committee News

SPS Social Media

IEEE SPS Educational Resources