SPS Webinar: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

Date: 7 May 2024
Time: 7:00 AM ET (New York Time)
Presenter(s): Dr. Ryandhimas E. Zezario

Original article: Download Open Access article
 

Abstract

Speech assessment metrics are indicators that quantitatively measure specific attributes of speech signals, and they are vital for developing speech-related application systems. The emergence of deep learning models and the need for non-intrusive methods that can accurately evaluate speech quality or intelligibility without requiring ground-truth labels have led to the development of many deep learning-based speech assessment models. In this webinar, the presenter will first discuss the general ideas of speech assessment metrics, including introducing conventional signal processing-based approaches. Next, he will introduce the general concept of deploying deep learning-based speech assessment models, including current existing strategies, important aspects, and challenges of model deployment. He will then introduce their approach, MOSA-Net, a deep learning-based non-intrusive Multi-Objective Speech Assessment model with cross-domain features. This model can simultaneously estimate speech quality, intelligibility, and distortion assessment scores of an input speech signal. Lastly, the presenter will introduce the direct integration of the speech assessment model for robust speech enhancement (SE) performance, where we adopt the latent representations of MOSA-Net to guide the SE process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach.

Biography

Ryandhimas E. Zezario (M) received the Ph.D. degree in computer science and information engineering from National Taiwan University, Taipei, Taiwan, in 2023.

He is also currently a Postdoctoral Researcher with the Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan. His research interests include non-intrusive quality assessment, speech enhancement, speech processing, and speech/speaker recognition.

Dr. Zezario was the recipient of the Gold Prize for the best non-intrusive systems and 1st place for the Hearing Industry Research Consortium student prizes at the Clarity Prediction Challenge 2022. He was also honored with the Best Reviewer award of IEEE ASRU 2023.