SPS JSTSP Webinar: Overview of Special Issue on Neural Speech and Audio Coding
Date: 08-October-2025
Time: 09:00 AM ET (New York Time)
Presenter(s): Dr. Minje Kim, Dr. Jan Skoglund, Dr. Gopala K. Anumanchipalli, Mr. Haohe Liu, Ms. Xue Jiang & Dr. Lars Villemoes
About this topic:
This webinar, part of the IEEE JSTSP webinar series, will focus on a recent special issue on Neural Speech and Audio Coding (NSAC}, a rapidly emerging research field at the intersection of signal processing and artificial intelligence (AI). Recent advances in deep neural networks and data-driven methodologies have opened new possibilities for unprecedented coding gain, which is far beyond the capabilities of traditional codecs. The special issue focuses on the challenges, insights, and opportunities that arise when applying AI-based methods to speech and audio coding, emphasizing novel architectures, inference-time efficiency, low-delay processing, and robustness to adverse acoustic conditions. The webinar will start with a brief introduction to the area of NSAC, followed by six presentations of technical papers appearing in the special issue.
About the presenter(s):
Minje Kim received the B.Sc. degree from Ajou University, Suwon, South Korea, the M. Sc. degree from Pohang University of Science and Technology, Pohang, South Korea and the Ph.D. degree in communications engineering from the National Tsing Hua University, Hsinchu, Taiwan, in 2004, 2006 and 2016, respectively.
He is currently an Associate Professor in the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign and an Amazon Scholar. Before then, he was an Associate Professor at Indiana University (2016-2023). During his career as a researcher, he has focused on developing machine learning models for audio signal processing applications.
Dr. Kim is a recipient of various awards, including the NSF Career Award (2021), IU Trustees Teaching Award (2021), IEEE SPS Best Paper Award (2020), Google and Starkey’s grants for outstanding student papers in ICASSP 2013 and 2014, respectively, and Richard T. Cheng Endowed Fellowship from UIUC in 2011. He is the Chair of the IEEE SPS Audio and Acoustic Signal Processing Technical Committee (2025-2026). He is serving as Senior Area Editor for IEEE Transactions on Audio, Speech, and Language Processing and IEEE Signal Processing Letters, Associate Editor for EURASIP Journal of Audio, Speech, and Music Processing, and Consulting Associate Editor for IEEE Open Journal of Signal Processing. He served as the General Chair of IEEE WASPAA 2023 and also as a reviewer, program committee member, or area chair for major machine learning and signal processing venues. He is the inventor of more than 60 patents.
Jan Skoglund received the Ph.D. degree from Chalmers University of Technology, Gothenburg, Sweden, in 1998.
He leads a team at Google in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering. These components have been deployed in Google software products such as Meet and hardware products such as Chromebooks. Prior to that he worked on low bit rate speech coding at AT&T Labs-Research, Florham Park, NJ. He was with Global IP Solutions (GIPS), San Francisco, CA, from 2000 to 2011 working on speech and audio processing, such as compression, enhancement, and echo cancellation, tailored for packet-switched networks. GIPS' audio and video technology was found in many deployments by, e.g., IBM, Google, Yahoo, WebEx, Skype, and Samsung, and was open-sourced as WebRTC after a 2011 acquisition by Google. Since then he has been a part of Chrome at Google.
Dr. Skoglund is a Senior Member with the IEEE involved in IEEE SPS Audio and Acoustic Signal Processing Technical Committee and IEEE SPS Speech and Language Processing Technical Committee, and an Associate Editor for IEEE Transactions on Audio, Speech, and Language Processing.
Gopala K. Anumanchipalli received the B.Tech and M.Sc. degrees in computer science from the International Institute of Information Technology, Hyderabad, India in 2008 and the Ph.D. degree in electrical and computer engineering from Instituto Superior Técnico, Lisbon, Portugal in 2013.
He is currently the Robert E. and Beverly A. Brooks Assistant Professor in Electrical Engineering and Computer Sciences with the University of California, Berkeley, Berkeley, CA, USA, where he leads the Berkeley Speech Group. His research focuses on the science and engineering of spoken language with application in AI and healthcare.
Dr. Anumanchipalli is a Member of the Berkeley AI Research, Weill Neurosciences and Computational Precision Health. He has been recognized as a Kavli Fellow, Google Research Scholar, JP Morgan AI Research Awardee, Hellman Fellow, Noyce Innovator, among other honors.
Haohe Liu received the B.Eng. degree from Northwestern Polytechnical University, Xi’an, China, in 2020. He is currently working toward the Ph.D. degree (final year) with the Centre for Vision Speech and Signal Processing, University of Surrey, Guildford, U.K..
His past research has contributed to audio quality enhancement, audio generation, source separation, and audio recognition. His research interests include developing AudioLDM for text-to-audio generation, generating wide attention in the open-source community.
Mr. Liu’s work as the primary author has been accepted in top journals and conferences such as IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE/ACM Transactions on Audio, Speech, and Language Processing, IEEE Journal of Selected Topics in Signal Processing, ICML, AAAI, ICASSP, and INTERSPEECH, and recent projects include AudioLDM, VoiceFixer, AudioSR, and NaturalSpeech..
Xue Jiang received the B.S. degree in 2020 from the Communication University of China, Beijing, China, where she is currently working toward the Ph.D. degree with the School of Information and Communication Engineering.
Her research interests include AI-based speech and audio signal processing, particularly in neural speech/audio coding and speech/audio representation learning.
Lars Villemoes received the Ph.D. degree in mathematics from the Royal Institute of Technology (KTH), Stockholm, Sweden, in 1995, and the Swedish Docent degree in 2001.
He is currently a Senior Principal Researcher with Dolby Sweden AB, Stockholm, where he leads fundamental research in audio coding and adjacent fields of audio signal processing. From 1995–1997, he was a Postdoctoral Researcher and visited the Department of Mathematics, Yale University, New Haven, CT, USA, and the Signal Processing Group of the Department of Signals, Systems, and Sensors, KTH. From 1997 to 2001, he was a Research Associate in wavelet theory with the Department of Mathematics, KTH. In 2007, he joined Dolby through its acquisition of Coding Technologies, which he joined in 2000. His research interests mainly include time-frequency analysis, machine learning, and auditory modeling. He has contributed to the MPEG Audio Standards on HE-AAC, MPEG Surround, SAOC, and USAC.
Dr. Villemoes is a Member of the IEEE and was one of the main architects of the AC-4 codec standardized by ETSI and developed many of its channel and object-based coding tools.