Skip to main content

SPS SLTC/AASP Webinar: Giving Voice and Face to AI

Apr

14

SPS Webinar

Date: 14-April-2026
Time: 8:00 AM ET (New York Time)
Presenter: Dr. Joon Son Chung

Abstract

As AI systems advance, building natural and intuitive multimodal interfaces is becoming increasingly critical. This talk examines technologies that equip AI with both a voice and a face, improving their capacity for seamless, expressive communication with humans.

Our presenter will discuss how incorporating visual and linguistic signals into speech synthesis enables alignment between acoustic output, facial and textual attributes, yielding more natural and expressive speech generation. Their recent video-to-speech work synthesizes speech directly from visual inputs, enabling communication where audio signals are limited or absent. In parallel, they present their talking head synthesis system, where audio inputs generate lifelike facial animations, effectively giving a face to the AI's voice and enriching the multimodal interaction.

Biography

Dr. Joon Son Chung
Dr. Joon Son Chung

Joon Son Chung received the B.A. and Ph.D. from the University of Oxford, working with Prof. Andrew Zisserman.

He is an associate professor at the School of Electrical Engineering, KAIST, where he is directing the Multimodal AI Lab. Previously, he was a research team lead at Naver Corporation, where he managed the development of speech recognition models for various applications including Clova Note.

Dr. Chung published in top tier publications including TPAMI and IJCV and has been the recipient of the best paper awards at Interspeech and ACCV.