Constrained Learned Feature Extraction for Acoustic Scene Classification

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Constrained Learned Feature Extraction for Acoustic Scene Classification

By: 
Teng Zhang; Ji Wu

Deep neural networks (DNNs) have been proven to be powerful models for acoustic scene classification tasks. State-of-the-art DNNs have millions of connections and are computationally intensive, making them difficult to deploy on systems with limited resources. With a focus on acoustic scene classification, we describe a new learnable module, the simulated Fourier transform module, which allows deep neural networks to implement the discrete Fourier transform operation 8x faster on a graphics processing unit (GPU). We frame the signal processing procedure as an adaptive machine learning problem and introduce learnable parameters in the module to facilitate fast adaptation for the complex and variable acoustic signal. This module gives neural networks the ability to model audio signals from raw waveforms, without extra fast Fourier transform and filter bank patches. Then, we use the temporal transformer module, which has been previously published, to alleviate the information loss caused by the simulated Fourier transform module. These techniques can be integrated into an existing fully connected neural network (FCNN), convolutional neural network (CNN), or recurrent neural network (RNN) models. We evaluate the proposed strategy using four acoustic scene datasets (LITIS Rouen, DCASE2016, DCASE2017, and DCASE2018) as target tasks. We show that the proposed approach significantly outperforms the vanilla FCNN, CNN, and RNN approach on both efficiency and performance. For instance, the proposed approach can reduce inference time by 8x while reducing the classification error on LITIS Rouen dataset from 3.21% to 1.81%.

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel