Every month, the joint laboratory invites external speakers to take part in seminars for its partners.

Yoshiaki BANDO (AIST, Japan) – Data-efficient audio scene analysis via spatial self-supervised learning

Abstract:

In this talk, we present spatial self-supervised learning for audio scene analysis. The scarcity of supervised data is a fundamental and widely recognized problem in audio scene analysis. To address this issue, we combine neural modeling and statistical signal processing in a unified framework. Specifically, we first introduce neural full-rank spatial covariance analysis (neural FCA), which enables multichannel source separation using only mixture signals. We then demonstrate its practical applications to sound event localization and detection (SELD) and joint diarization and separation. We further discuss the potential of spatial self-supervised learning as a data-efficient framework for audio scene analysis tasks.

tags: audio scene analysis, spatial audio, self-supervised learning, variational inference

Bio:
Yoshiaki Bando / 坂東 宜昭 · Bando Lab / 坂東研究室

Yoshiaki Bando is a Senior Researcher at National Institute of Advanced Industrial Science and Technology (AIST), Japan. He received his Ph.D. in Informatics from Kyoto University in 2018. His research interests include audio and speech signal processing and robot audition, with a particular focus on unsupervised learning and multichannel signal processing. He has worked on a wide range of problems such as sound event detection, distant speech recognition, and blind source separation, aiming to build robust audio systems for real-world environments.

speaker’s website | speaker’s google scholar