In this talk, I will give an overview of my work on sound source separation, which is the task of automatically extracting constitutive components from their observed mixture in an audio recording. I will address it in the time-frequency domain, which reveals the underlying structure of sounds. Most methods usually process spectrogram-like quantities only and discard the phase information: this sets a limit to their performance, which in turns motivates to better account for the phase in source separation. First, I will present my work on model-based phase recovery: this approach consists in extracting phase properties from time-domain signal models (such as mixtures of sinusoids), and incorporating those in source separation models. Then, I will mention some optimization-based approaches to address this problem, by leveraging the consistency property of the transform. In the next part, I will introduce a phase-aware probabilistic framework based on the von Mises and anisotropic Gaussian distributions, which allows for including phase priors in a statistical framework. If time allows, I will then combine these techniques with spectrogram decomposition models (nonnegative matrix factorization, deep neural networks) for the joint estimation of spectrograms and phase. Finally, the last part of this talk will present prospective future research, notably on the topics of deep phase processing and its conjunction with time-domain approaches.
Paul Magron received the State Engineering degree from the École des Ponts ParisTech (Paris, France) in 2013, the M.Sc. degree in acoustics, signal processing and computer science applied to music from the Sorbonne University (Paris, France) in 2013, and the Ph.D. degree from Télécom ParisTech (Paris, France) in 2016, in the field of signal processing. From 2017 to 2019, he worked as a postdoctoral researcher within the Audio Research Group, Tampere University, (Tampere, Finland). From 2019 to 2021, he works as a postdoctoral researcher within the Signal and Communications group, Institut de Recherche en Informatique de Toulouse (IRIT – Toulouse, France). Since October 2021, he is a tenured research scientist (Chargé de Recherche INRIA) with LORIA (Nancy, France). His research interests include audio signal processing, sound source separation, phase recovery, nonnegative matrix factorization, probabilistic modeling and music recommendation. He has authored about 20 scientific publications on the above-mentioned topics, and is the recipient of the iWAENC 2018 best paper award for his work on complex nonnegative matrix factorization with beta-divergences.
More on the speaker’s website