Jun
11
Date: 11-June-2026
Time: 9:00 AM ET (New York Time)
Presenter: Dr. Kong Aik Lee
Abstract
Speaker identity is one of the most critical elements conveyed by speech signals. With accurate modeling, this information can be used in recognition and generation tasks, including speaker recognition, speaker diarization, target speaker extraction, speech synthesis, and voice anonymization. Over the past decades, speaker modeling has evolved from generative approaches, such as i-vectors, to powerful deep learning-based embeddings, including x-vectors and their modern variants. Despite their success, these methods largely rely on deterministic point estimates, overlooking the inherent uncertainty arising from data variability, noise, and limited observations. In this talk, the presenter revisits speaker representation learning through the lens of uncertainty. He will first provide a historical perspective on generative and discriminative modeling paradigms, highlighting their assumptions and limitations. Then introduce probabilistic speaker embeddings based on a linear Gaussian framework, where uncertainty is explicitly modeled alongside the embedding itself. The xi-vector approach, and its extensions, are presented as a concrete realization, enabling both uncertainty-aware scoring and training.
Biography
Kong Aik Lee received the Ph.D. degree from Nanyang Technological University, Singapore, in 2006.
He is currently an Associate Professor at The Hong Kong Polytechnic University since 2023. Following his graduation, he joined the Institute for Infocomm Research, Singapore, where he worked as a Research Scientist and concurrently served as a Strategic Planning Manager. From 2018 to 2020, he was a Senior Principal Researcher at the Data Science Research Laboratories, NEC Corporation, Tokyo, Japan. From 2020 to 2023, he served as a Principal Scientist and Group Leader at the Agency for Science, Technology and Research (A*STAR), Singapore, while concurrently holding an appointment as an Associate Professor at the Singapore Institute of Technology. His research interests include speech privacy, voice biometrics and security, speaker recognition and anonymization, speech spoofing and countermeasures, machine learning and deep learning, and signal processing.
Dr. Lee is a recipient of the Singapore IES Prestigious Engineering Achievement Award (2013) and the IEEE ICME Outstanding Service Award (2020). He has served on the Editorial Board of Computer Speech and Language (Elsevier) since 2016 and was an Associate Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processingfrom 2017 to 2021. He was the General Chair of the Speaker Odyssey 2020 Workshop, and an elected member of the IEEE Speech and Language Processing Technical Committee (2019-2021, 2022-2024). He currently serves as a Senior Associate Editor for the IEEE Signal Processing Letters, Chair of ISCA Speaker and Language Characterization (SpLC) SIG, and ISCA Lead AC (Speaker and Language Characterization, 2026 to 2027).
