Chaque mois, le laboratoire invite des intervenants extérieurs à prendre la parole lors de séminaires à destination de ses partenaires.
Ce mois-ci, c’est Salah Zaiem, doctorant en 3ème année dans le groupe audio de Télécom Paris ADASP, qui fera une présentation intitulée :
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, and while the number of considered tasks has been growing, most rely upon a single decoding architecture that maps the frozen SSL representations to the downstream labels. This work investigates the robustness of such benchmarking results to changes in the decoder architecture. Interestingly, it appears that varying the architecture of the downstream decoder leads to significant variations in the leaderboards of most tasks. Concerningly, our study reveals that benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.
Salah Zaiem is currently working toward the Ph.D. degree at Telecom Paris, France, supervised by Slim Essid and Titouan Parcollet. His research focuses on understanding and motivating the choices in Self-supervised learning pipelines for speech.
More on the speaker’s Website.