SPS SL TC Webinar: End-to-End Automatic Speech Recognition

Date: 10 May 2024
Time: 1:00 PM ET (New York Time)
Presenter(s): Dr. Jinyu Li

Abstract

The field of automatic speech recognition (ASR) is now dominated by the end-to-end (E2E) models that directly map speech to text. In this talk, the presenter will give an overview of the E2E ASR models and introduce the recent progress from an industry perspective. To design an E2E model that has high accuracy and low latency, a masking strategy was applied to Transformer Transducer. He will discuss technologies that can use text-only data for general model training through pretraining and adaptation to a new domain through augmentation and factorization. Our presenter will also discuss how to build multilingual ASR models to serve all the users. Then, he will extend E2E modeling for streaming multi-speaker ASR and finally ending the talk with some new research opportunities he can explore.

Biography

Melissa HandaJinyu Li (M’08, SM’21) received the B.E. and M.E. degrees in electrical engineering and information system from University of Science and Technology of China, Hefei, China, in 1997 and 2000, respectively. He received the Ph.D. degree in electrical and computer engineering from Georgia Institute of Technology, Atlanta, GA, USA in 2008.

He currently serves as a Partner Applied Science Manager for Microsoft, Redmond, WA, USA since 2008 and leads a dynamic team dedicated to designing and enhancing speech modeling algorithms and technologies. Their aim is to ensure that Microsoft products maintain cutting-edge quality within the industry. From 2000 to 2003, he was a Researcher in the Intel China Research Center and Research Manager in iFlytek, China. His diverse research areas include end-to-end modeling for speech recognition and speech translation, deep learning, acoustic modeling, and noise robustness.

 Dr. Li has been a member of IEEE Speech and Language Processing Technical Committee since 2017. He also served as the associate editor of IEEE/ACM Transactions on Audio, Speech and Language Processing from 2015 to 2020. He was awarded as the Industrial Distinguished Leader at Asia-Pacific Signal and Information Processing Association (APSIPA) in 2021 and APSIPA Sadaoki Furui Prize Paper Award in 2023.