Constrained Learned Feature Extraction for Acoustic Scene Classification

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Constrained Learned Feature Extraction for Acoustic Scene Classification

By: 
Teng Zhang; Ji Wu

Deep neural networks (DNNs) have been proven to be powerful models for acoustic scene classification tasks. State-of-the-art DNNs have millions of connections and are computationally intensive, making them difficult to deploy on systems with limited resources. With a focus on acoustic scene classification, we describe a new learnable module, the simulated Fourier transform module, which allows deep neural networks to implement the discrete Fourier transform operation 8x faster on a graphics processing unit (GPU). We frame the signal processing procedure as an adaptive machine learning problem and introduce learnable parameters in the module to facilitate fast adaptation for the complex and variable acoustic signal. This module gives neural networks the ability to model audio signals from raw waveforms, without extra fast Fourier transform and filter bank patches. Then, we use the temporal transformer module, which has been previously published, to alleviate the information loss caused by the simulated Fourier transform module. These techniques can be integrated into an existing fully connected neural network (FCNN), convolutional neural network (CNN), or recurrent neural network (RNN) models. We evaluate the proposed strategy using four acoustic scene datasets (LITIS Rouen, DCASE2016, DCASE2017, and DCASE2018) as target tasks. We show that the proposed approach significantly outperforms the vanilla FCNN, CNN, and RNN approach on both efficiency and performance. For instance, the proposed approach can reduce inference time by 8x while reducing the classification error on LITIS Rouen dataset from 3.21% to 1.81%.

SPS on Twitter

  • Registration is now live for the 2020 IEEE 6th World Forum on Internet of Things! Meet attendees from industry, the… https://t.co/1T7vQhAazS
  • Early bird registration for ends on Monday, 24 February. Register today and save, and save even more with… https://t.co/dzlSXdN4y8
  • The IEEE Journal of Selected Topics in Signal Processing is now accepting original manuscripts for a Special Issue… https://t.co/mXKh41of5A
  • Join us on Tuesday, 25 February for a new webinar, “Enabling Identity-Based Integrity Auditing and Data Sharing Wit… https://t.co/rfpjVkEv09
  • The 2020 IEEE International Conference on Autonomous Systems will take place in Montréal on 12-14 August 2020 and w… https://t.co/ePFEWYagwP

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar