Every month, the joint laboratory invites outside speakers to take part in seminars for its partners.

Stefan Lattner: “Advances in Music Information Retrieval at Sony CSL: From Self-Supervised Methods to Music Online Learning”

Abstract: The music team at Sony Computer Science Laboratories focuses on music and audio generation for music production, as well as music information retrieval (MIR). Our work mainly revolves around generative AI, but in this talk, I will discuss our recent works on MIR and music analysis. Self-supervised methods have gained popularity in machine learning research, and I will highlight three publications that implement this concept. These involve singer identity representation learning for singer similarity estimation and classification, drum sample retrieval by musical context, and a lightweight pitch estimation system. Additionally, I will introduce a differentiable short-term memory that can be pre-trained on symbolic music data and then learns the distribution of single songs in an online fashion. Finally, I will also delve into a study on typicality in musical sequences, a concept that has recently gained attention in natural language processing.

Bio: Stefan Lattner serves as a researcher leader at the music team at Sony CSL Paris, where he focuses on generative AI for music production, music information retrieval, and computational music perception. He earned his PhD in 2019 from Johannes Kepler University (JKU) in Linz, Austria, following his research at the Austrian Research Institute for Artificial Intelligence in Vienna and the Institute of Computational Perception Linz. His studies centered on the modeling of musical structure, encompassing transformation learning and computational relative pitch perception. As a musician, his interests include human-computer interaction in music creation, live staging, computational aesthetics, and information theory and perception in music. As a computer scientist, he specializes in generative sequence models, computational short-term memories, few-shot learning, contrastive and multimodal learning, invariance/transformation learning, structure learning, generative adversarial networks, and diffusion models. In 2019, Lattner received the best paper award at ISMIR for his work, “Learning Complex Basis Functions for Invariant Representations of Audio.”

Primer of ADASP presentations at ISMIR-2023

  • Changhong Wang “Transfer Learning and Bias Correction with Pre-trained Audio Embeddings”
  • Morgan Buisson “A Repetition-based Triplet Mining Approach for Music Segmentation”
  • Bernardo Torres “Singer Identity Representation Learning using Self-Supervised Techniques”
  • Alain Riou “PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective”
  • Geoffroy Peeters “Self-Similarity-Based and Novelty-based loss for music structure analysis”