SPS SLTC/AASP Webinar: Giving Voice and Face to AI

Apr

14

Date: 14-April-2026
Time: 8:00 AM ET (New York Time)
Presenter: Dr. Joon Son Chung

Abstract

As AI systems advance, building natural and intuitive multimodal interfaces is becoming increasingly critical. This talk examines technologies that equip AI with both a voice and a face, improving their capacity for seamless, expressive communication with humans.

Our presenter will discuss how incorporating visual and linguistic signals into speech synthesis enables alignment between acoustic output, facial and textual attributes, yielding more natural and expressive speech generation. Their recent video-to-speech work synthesizes speech directly from visual inputs, enabling communication where audio signals are limited or absent. In parallel, they present their talking head synthesis system, where audio inputs generate lifelike facial animations, effectively giving a face to the AI's voice and enriching the multimodal interaction.

Biography

Joon Son Chung received the B.A. and Ph.D. from the University of Oxford, working with Prof. Andrew Zisserman.

He is an associate professor at the School of Electrical Engineering, KAIST, where he is directing the Multimodal AI Lab. Previously, he was a research team lead at Naver Corporation, where he managed the development of speech recognition models for various applications including Clova Note.

Dr. Chung published in top tier publications including TPAMI and IJCV and has been the recipient of the best paper awards at Interspeech and ACCV.

Website Link

Register

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

SPS SLTC/AASP Webinar: Giving Voice and Face to AI

Apr

14

Tags

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training