At the beginning of this year, the laboratory will be pleased to hold its second semi-annual workshop and steering committee.
8:30 Welcome & coffee
9:10 – 10:05 Mark Plumbley, University of Surrey, ”AI for Sound”
Imagine you are standing on a street corner in a city. Close your eyes: what do you hear? Perhaps some cars and busses driving on the road, footsteps of people on the pavement, beeps from a pedestrian crossing, and the hubbub of talking shoppers. You can do the same in a kitchen as someone is making breakfast, or as you are travelling in a vehicle. Now, following the success of AI and machine learning technologies for speech and image recognition, we are beginning to build computer systems to automatically recognize real- world sound scenes and events. In this talk, we will explore some of the work going on in this rapidly expanding research area, and discuss some of the potential applications emerging for sound recognition, from home security and assisted living to environmental noise and sound archives. We will also outline how we are adopting participatory methods, such as a virtual world cafe approach, to direct project outcomes from stakeholders, and so help us realise the potential benefit of sound sensing to society and the economy.
Prof. Mark Plumbley is Professor of Signal Processing at the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey, in Guildford, UK.
He is an expert on analysis and processing of audio, using a wide range of signal processing and machine learning methods. He led the first international data challenge on Detection and Classification of Acoustic Scenes and Events (DCASE), and is a co-editor of the recent book on “Computational Analysis of Sound Scenes and Events” (Springer, 2018).
He currently holds a 5-year EPSRC Fellowship “AI for Sound” on automatic recognition of everyday sounds. He is a Member of the IEEE Signal Processing Society Technical Committee on Audio and Acoustic Signal Processing, and a Fellow of the IET and IEEE.
10:05 – 11:05 ADASP PhDs presentations
11:05 Coffee break
11:20 – 12:00 Alexandre Defossez, Meta, ”Producing audio with deep learning, from expectations to distributions.”
Deep learning has opened up new possibilities for the treatment and generation of audio. Some tasks can be expressed as regression, e.g. by producing the expected output given the input, such as source separation in usual conditions. Other tasks do require to handle the probability distribution of the output. Even simple use cases like de-clipping can lead to non-deterministic outputs, and will require some form of distribution modelling.
In this talk, I will cover some of the recent trends in audio modelling, including adversarial losses, auto-regressive modelling, or diffusion model, and discuss the different trade-offs they reach. I will further discuss the potential of working in a latent space where the intrinsic complexity of the audio domains is abstracted away, allowing downstream task to work with the simple “expected output” approach.
Alexandre Defossez is a research scientist at Meta AI in Paris, working on deep learning based signal processing, primarily for audio applications (source separation, compression, generation), but also for brain signal analysis.
He completed his PhD at INRIA Paris while being a resident at Meta AI, under the supervision of Léon Bottou, Francis Bach, and Nicolas Usunier.
12:10 Lunch break
14:00 Steering committee