SPS Webinar: BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations

Date: 13 June 2024
Time: 7:00 AM ET (New York Time)
Presenter(s): Mr. Daisuke Niizumi

Original article: Download Open Access article


Self-supervised learning (SSL) models perform remarkably in various domains, including audio. In this webinar, the presenter will introduce an audio SSL, Bootstrap Your Own Latent (BYOL) for Audio (BYOL-A), that learns a general-purpose audio representation effective for various audio tasks. The presenter hypothesizes that the representations should provide multi-aspect information to serve the various needs of diverse tasks. BYOL-A learns a robust representation against sound changes, such as pitch and background noise, and combines multi-layer features. As a result, BYOL-A demonstrates generalizability with the best average result of 72.4% among nine tasks and the best speaker identification task VoxCeleb1 accuracy of 57.6% in the experiments. The presenter will investigate the performance contribution of BYOL-A components. The presenter will also introduce the various use cases from other studies, such as video understanding, for how the studies used BYOL-A in their deep learning framework.


Daisuke Niizumi received the B.S. and M.S. degrees from the Department of Computer Science and Systems Engineering of the Kyushu Institute of Technology, Kitakyushu, Japan, in 1995 and 1997, respectively.

Mr. Niizumi joined NTT Corporation in 2020 as a research scientist. From 1997 to 2020, he was a senior software and machine learning engineer/manager at several consumer electronics companies. His research interests include representation learning, self-supervised learning, and multimodal deep learning.