Every month, the joint laboratory invites outside speakers to take part in seminars for its partners.

Haohe Liu (University of Surrey): “Latent Diffusion Model as a Versatile Coarse-to-Fine Audio Decoder”

Abstract: Latent diffusion models (LDMs) have demonstrated exceptional generative capabilities across various modalities. This talk will explore LDMs as a coarse-to-fine audio decoder, offering a versatile framework for audio tasks. We will begin by covering the fundamentals of diffusion models and their control over forward and backward processes. Next, we will look into specific applications, including the AudioLDM series for text-to-audio generation, models for audio quality enhancement, and neural audio codecs. The talk will highlight common design principles across these models and include interactive demos. We will conclude by discussing the strengths and limitations of LDMs in audio decoding and potential future research directions.
Bio: Haohe Liu is a final-year PhD student at the Centre for Vision, Speech, and Signal Processing (CVSSP), University of Surrey, UK. He earned his B.Eng from Northwestern Polytechnical University, Xi’an, China, in 2020. His contributions span audio quality enhancement, generation, source separation, and recognition. As the primary author, his work has been featured in top venues such as TPAMI, TASLP, ICML, AAAI, ICASSP, and INTERSPEECH. Notable projects of Haohe include AudioLDM, VoiceFixer, AudioSR, and NaturalSpeech. Haohe has also interned at Meta, Microsoft, and ByteDance.