SPS Webinar: AudioLDM 2: Learning Holistic Audio Generation with Self-Supervised Pretraining

Aug

12

Date: 12-August-2026
Time: 11:30 AM ET (New York Time)
Presenter: Dr. Haohe Liu

Based on the IEEE Xplore® article under the same title
Published IEEE Transactions on Audio, Speech, and Language Processing, May 2024.

Download article: Original article will be made publicly available for download on the day of the webinar for 48 hours. ARTICLE LINK

About this topic:

AudioLDM 2 is a holistic framework for unified audio generation that produces speech, music, and sound effects using a single model. Unlike prior approaches that require separate architectures with task-specific designs for each audio type, AudioLDM 2 introduces a general audio representation called the "language of audio" (LOA), learned through AudioMAE, a self-supervised pretrained model. During generation, a GPT-2 model translates input conditions such as text into LOA, which then guides a latent diffusion model to synthesize high-quality audio. This design unifies diverse audio generation tasks under one framework while enabling advantages such as in-context learning and reusable pretrained components.

The talk will cover the motivation behind holistic audio generation, the AudioLDM 2 architecture, key experimental findings, practical lessons learned from building a unified audio generation system, and recent advancement in this rapidly evolving field.

About the presenter:

Haohe Liu (M’26) received the Ph.D. degree from the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, U.K., in 2026, supervised by Prof. Mark D. Plumbley and Prof. Wenwu Wang.

He is currently a Research Scientist at Meta SuperIntelligence Lab (FAIR), Seattle, WA, USA. His research spans generative AI for audio, speech, and music, with a focus on developing foundational models that address core machine learning challenges. He is best known as the creator of the AudioLDM series. His open-source projects, including VoiceFixer, AudioSR, and AudioLDM, have collectively received over 11,000 GitHub stars, and his work has received over 5,600 citations.

Dr. Liu received the Postgraduate Researcher of the Year 2024 Award from CVSSP, the Best Technical Paper Award at the 159th AES Convention, and the Judges' Award in the DCASE 2023 Foley Sound Synthesis Challenge. His work has been published at venues including ICML, NeurIPS, AAAI, TPAMI, JSTSP, ICASSP, INTERSPEECH, and IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Website Link

Register

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

SPS Webinar: AudioLDM 2: Learning Holistic Audio Generation with Self-Supervised Pretraining

Aug

12

Tags

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training