In this month, the laboratory will be pleased to hold its sixth semi-annual workshop and steering committee.

Programme

9:00 Welcome & coffee

9:25 Introduction

9:30 – 10:30 Slim Essid, ”Realtime Machine Listening at the Edge”

Abstract

I will present an overview of our research and development effort within the AUDIBLE project, a BPI-funded project which aims to revolutionise hearable technologies (especially TWS/earbuds) by developing a platform which will enable unprecedented use cases through artificial intelligence (AI) innovations, a highly energy-efficient and powerful DSP and AI processor, and the integration of a miniaturized biometric sensor. The focus will be on the realtime machine listening solutions we are developing at ADASP, including speech enhancement and speaker diarization, acoustic scene analysis, music characterization and adaptive rendering. I will describe our end-to-end approach from data collection to efficient implementation for realtime execution at the edge, through deep learning systems developed.

Bio

Slim Essid is Full Professor of Télécom Paris and the coordinator of the Audio Data Analysis and Signal Processing (ADASP) group. He received the Ph.D. degree from the Université Pierre et Marie Curie (UPMC), in 2005 and the habilitation (HDR) degree from UPMC in 2015. Over the past 15 years, he has been involved in various French and European research projects. He has collaborated with 14 post-docs and has graduated 17 PhD students; he is currently co-advising 10 others. He has published over 150 peer-reviewed conference and journal papers with more than 100 distinct co-authors. On a regular basis he serves as a reviewer for various machine learning, signal processing, audio and multimedia conferences and journals, for instance various IEEE transactions, and as an expert for research funding agencies.

10:30 – 11:15 ADASP PhD Presentations

David Perera, ”Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing”, NeurIPS 2024

Hugo Malard, ”An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment”, NeurIPS 2024

Victor Letzelter, ”ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation”, NeurIPS 2024

11:15 Coffee break

11:30 – 12:30 Sebastian Stober, ”A Roadmap Towards Understanding the Essence of Music”

Abstract

What is music? Is it “the language of us all” (The Cat Empire – How to explain) or even “the language of our soul” as the same line can be slightly misheard? Surely, it must be more than just structured sound or something that sounds like the millions of songs used for training of a state-of-the-art generative artificial intelligence (AI) for music. The former seems too wide and the latter too narrow leading to just more of the same and limiting true creativity.

Throughout millennia, humans have radically transformed music multiple times across cultures. Even if we ignore the avantgarde like John Cage’s “4’33” that some will say is not music to begin with, composers consistently invent new kinds of music in an open-ended creative process. At the same time, one of the greatest challenges in Music Information Retrieval (MIR) is to learn what principles actually define music.

In this talk, I would like to propose a radically new approach to answering this question that originated in discussions at Dagstuhl Seminar 24302 on “Learning with Music Signals” in July 2024. It builds upon the hypothesis that music perception and cognition is based on mechanisms rooted deeply in the human brain – likely even pre-dating the emergence of language – and that it is additionally shaped by our cultural exposure to existing music. Given these two ingredients – first principles of music perception and cognition plus cultural musical memory – new music could be created that is still based on the first principles and that relates to the musical memory but at the same time goes significantly beyond the previously known music. If an open-ended creative process was guided by the right first principles, its outcome would still be perceived as music. This could uncover new insights into what defines music and expand creative boundaries.

Bio

Sebastian Stober is professor for Artificial Intelligence at the Otto-von-Guericke-University Magdeburg, Germany. He studied computer science with focus on intelligent systems in Magdeburg until 2005 and received his PhD with distinction on the topic of adaptive methods for user-centered organization of music collections in 2011. From 2013 to 2015, he was postdoctoral fellow at the Brain and Mind Institute in London, Ontario where he pioneered deep learning techniques for studying brain activity during music perception and imagination. Afterwards, he was head of the Machine Learning in Cognitive Science Lab at the University of Potsdam, before returning to Magdeburg in 2018. In his current research, he investigates and develops generative models for music and speech as well as methods to better understand what an artificial intelligence has learned and how it solves specific problems. To this end, he combines the fields of artificial intelligence and machine learning with cognitive neuroscience and music information retrieval.

12:30 Lunch break

14:00 Steering committee meeting