SPS SLTC/AASP TC Webinar: Conversational Speech Processing and Recognition: Speech Separation, End-to-End Modeling, and Speaker Diarization

Date: 23 July 2024
Time: 1:00 PM ET (New York Time)
Presenter(s): Dr. Takuya Yoshioka

Abstract

Recognizing conversational speech involves processing multi-talker human-to-human communications. It requires overcoming various challenges resulting from dealing with natural conversations, promoting progress in various topics, including speech separation, end-to-end modeling, speaker diarization, and utilizing self-supervised models, to name a few. This webinar will introduce recent research advances in these domains as well as insights gained from applications of these methods to real-world commercial scenarios.

Biography

Takuya Yoshioka

Takuya Yoshioka received the B.Eng., M.Inf., and Ph.D. degrees in informatics from Kyoto University, Kyoto, Japan, in 2004, 2006, and 2010, respectively.

He has been the Director of Research at Assembly AI Inc., US, since 2023, leading the company's model and algorithm development efforts, encompassing ASR, speaker diarization, and NLP. Prior to joining AssemblyAI, he led a research team at Microsoft Azure Cognitive Services Research, developing technologies for speech enhancement, speech generation, meeting transcription, and speaker diarization. Before this role, he conducted research in speech processing at Microsoft Research and NTT Communication Science Laboratories for more than 10 years.

Dr. Yoshioka received the Conference Best Paper Award for Industry from IEEE SPS in 2022 and led a winning team of the CHiME-3 Challenge in 2015.