SPS Webinar: Multi-Scale Spectral Loss Revisited
Date: 23 July 2025
Time: 9:00 AM ET (New York Time)
Presenter(s): Mr. Simon Schwär
Based on the IEEE Xplore® article:
"Multi-Scale Spectral Loss Revisited"
Published: IEEE Signal Processing Letters, November 2023.
Download article: Original article is open access and publicly available for download.
Abstract
The Multi-Scale Spectral (MSS) loss is widely used for comparing audio signals, offering a good balance between temporal and spectral resolution, while allowing for phase differences between waveforms that are perceptually irrelevant. However, the configuration of this loss function, including parameters such as window type and size, hop size, and magnitude compression, is often chosen empirically and without explicitly considering the impact on loss behavior. This is particularly relevant in the context of differentiable digital signal processing (DDSP), where loss gradients are back-propagated through fixed DSP building blocks before they are used as learning signals. In this webinar, the presenter gives an overview of various MSS loss configurations and analyzes the effects of individual loss parameters in detail. Using common DDSP components such as oscillators and filters, they illustrate cases where trade-offs between configuration choices become important. Furthermore, they present examples where the MSS loss fails to provide meaningful gradients entirely and discuss potential workarounds proposed in literature.
Biography
Simon Schwär (M'23) received the B.Eng. degree in audio engineering from Robert Schumann Conservatory in Düsseldorf, Germany in 2017 and the M.Sc. degree in signal processing and communications engineering from Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany in 2022. He is currently pursuing his Ph.D. degree in the music processing group of Prof. Meinard Müller at International Audio Laboratories Erlangen, a joint institute of Friedrich-Alexander-Universität Erlangen-Nürnberg and the Fraunhofer Institute for Integrated Circuits IIS.
He was an Audio Software Developer at the Fraunhofer Institute for Integrated Circuits IIS, from 2017 to 2021, working on real-time VR/AR audio rendering for complex scenes. His current research interests include the computational analysis of singing voice and musical instrument intonation, as well as musically meaningful loss functions between audio signals.