In the middle of this year, the laboratory will be pleased to hold its third semi-annual workshop and steering committee.


9:00 Welcome & coffee

9:30 Introduction

9:35 – 10:15 Juanjo Bosch, Spotify, Research Scientist, ”A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation”


“Basic-pitch” is a lightweight neural network for musical instrument transcription, which supports polyphonic outputs and generalizes to a wide variety of instruments (including vocals). In this talk, we will discuss how we built and evaluated this efficient and simple model, which experimentally showed to be substantially better than a comparable baseline in detecting notes. The model is trained to jointly predict frame-wise onsets, multipitch and note activations, and we experimentally showed that this multi-output structure improves the resulting frame-level note accuracy. We will also listen to examples using (and misusing) this model for creative purposes, using our open-source python library, or demo website: thanks to its scalability, the model can run on the browser, and your audio doesn’t even leave your own computer.


Juanjo Bosch is a Research Scientist at Spotify (Paris), and previously completed his Ph.D. at the Music Technology Group at Universitat Pompeu Fabra (Barcelona), working on automatic melody extraction from music signals. He has participated in several international research projects, and was a visiting researcher at Queen Mary University of London. He did his master’s degree in Sound and Music Computing at the UPF, and previously graduated as a Telecommunications Engineer at UPV Valencia. He has also worked as a research engineer at the Fraunhofer Institute, and at Yamaha Corporation. His main research interests include automatic music analysis and generation, audio transformations and creative applications of music information research.

10:15 – 10:55 Présentations des doctorants d’ADASP

10:15 – 10:35  Victor Letzelter: “Multiple hypotheses generation for ambiguous tasks and application in spatial audio scene analysis”

10:35 – 10:55 David Perera: “Invariance-based semi-supervised representation learning for sound event detection

10:55 Coffee break

11:10 – 12:05 Mathieu Lagrange, LS2N Nantes, CNRS research scientist, ”Beyond Fourier. On the use of Timbre models for generative audio synthesis”


This talk will motivate the need for differentiable timbre models in order to effectively train and evaluate generative audio systems. Besides derivability, a number of important properties are mandated for an effective modeling of the early sections of the human auditory system. I will first argue that Fourier based timbre descriptors fall short to explicitly model the main dimensions of timbre. Next, I will present richer models: Spectro-Temporal Receptive Fields and the Joint Time-Frequency Scattering Transform, discuss their equivalence as timbre models and show applications of the latter to concrete problems in audio generative synthesis.


Mathieu Lagrange is a CNRS research scientist at LS2N. He obtained his PhD from the University of Bordeaux in 2004. Before joining CNRS, he was a scientist in Canada (University of Victoria, McGill University) and in France (Télécom Paris, Ircam). His research focuses on signal processing and machine learning algorithms applied to musical and environmental audio analysis and synthesis.

12:10 Lunch break

14:00 Steering committee