This month, the laboratory is pleased to host its seventh semi-annual workshop and steering committee meeting.

Programme

9:00 Welcome & coffee

9:25 Introduction

9:30 – 10:30  Meinard Müller – International Audio Laboratories Erlangen, Erlangen, Germany

Loss Functions Matter: Three Case Studies in Informed Loss Design

Abstract: Loss functions are fundamental to deep learning, as they quantify the discrepancy between model predictions and reference targets. They guide the optimization process and have a direct impact on model performance and convergence. Nevertheless, loss design and configuration are often treated as secondary aspects, which can result in unstable training and solutions that fail to capture the structure and objectives of the task. In this presentation, with a focus on audio and music processing, we highlight the importance of well-designed loss functions through three specific case studies. First, we revisit the Multi-Scale Spectral Loss for audio synthesis and analyze how different configurations affect gradient quality and training stability. Second, we introduce a loss function for hierarchical classification that promotes consistency across multiple levels of semantic attributes (e.g., singing activity, gender, and voice type in music audio classification). Third, we discuss a differentiable version of dynamic time warping, designed for learning from weakly aligned data, and discuss strategies for improving training stability. Rather than providing a general overview, we focus on concrete examples to show how incorporating prior knowledge and task-specific structure into loss functions can lead to more robust and interpretable learning outcomes.

Bio: Meinard Müller received the Diploma degree (1997) in mathematics and the Ph.D. degree (2001) in computer science from the University of Bonn, Germany. Since 2012, he has held a professorship for Semantic Audio Signal Processing at the International Audio Laboratories Erlangen, a joint institute of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer Institute for Integrated Circuits IIS. His recent research interests include music processing, music information retrieval, audio signal processing, and motion processing. He was a member of the IEEE Audio and Acoustic Signal Processing Technical Committee (2010-2015), a member of the Senior Editorial Board of the IEEE Signal Processing Magazine (2018-2022), and a member of the Board of Directors, International Society for Music Information Retrieval (2009-2021, being its president in 2020/2021). In 2020, he was elevated to IEEE Fellow for contributions to music signal processing. Currently, he also serves as Editor-in-Chief for the Transactions of the International Society for Music Information Retrieval (TISMIR). Besides his scientific research, Meinard Müller has been very active in teaching music and audio processing. He gave numerous tutorials at major conferences, including ICASSP (2009, 2011, 2019) and ISMIR (2007, 2010, 2011, 2014, 2017, 2019, 2023, 2024). Furthermore, he wrote a monograph titled “Information Retrieval for Music and Motion” (Springer 2007) as well as a textbook titled “Fundamentals of Music Processing” (Springer-Verlag 2015).

10:30 – 11:15 ADASP PhD presentations (5-minute presentations, followed by 5-minute Q&A after each pair)

Louis Bahrman: “Hybrid Deep Learning for Audio Restauration: the Case of Dereverberation”
Côme Peladeau: “Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values”

Manvi Agarwal: “Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation”
Teysir Baoueb: “Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models”

Michel Olvera: “Integrating Knowledge Graphs with Audio-Language Models”
Xuanyu Zhuang: “Episode-specific Fine-tuning for Metric-based Few-shot Learners with Optimization-based Training”

11:15 Coffee break

11:30 – 12:30 Sølvi Ystad – Laboratory PRISM (Perception, Representations, Image, Sound, Music), Marseille, France

Perceptual Engineering as a Means to Investigate Human Perception

Abstract: When digital sound synthesis and computer music first emerged in the 1960s, the renowned scientist John Pierce made this enthusiastic remark about computer-generated sounds: “Wonderful things would come out of that box if only we knew how to evoke them.” Despite the many synthesis algorithms developed since then, evocative control remains a fundamental challenge—one that continues to prevent many potential users from incorporating sound synthesis into their applications. In this presentation, I will describe a methodology that my colleagues and I have developed over the past 20 years, combining digital synthesis with experimental psychology. This approach, often referred to as perceptual engineering, has allowed us to identify perceptually relevant sound morphologies. These morphologies not only help us better understand how we perceive our environment but also enable intuitive sound control using verbal labels that reference the source of the sound-producing system. In recent years, we have extended this framework to explore multimodal perception and immersive environments. The methodology will be illustrated through videos and sound examples.

Bio: Sølvi Ystad obtained her degree as an electronic engineer from the Norwegian Institute of Technology – NTH (Norges Tekniske Høgskole), Trondheim, Norway in 1992 and her doctorate in acoustics from the University of Aix-Marseille II, France, in 1998. After a postdoctoral stay at the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, California, she obtained a research position at the Centre National de la Recherche Scientifique (CNRS) in 2002. In 2017 she co-founded the interdisciplinary laboratory PRISM – Perception, Representations, Image, Sound, Music (www.prism.cnrs.fr), of which she has been the director since January 2024. Her research activities focus mainly on auditory and multimodal perception, which she investigates by combining numerical modelling and experimental psychology.

12:30 Lunch break

14:00 – 15:00 Steering committee meeting (among organizers and laboratory partners)