Constrained Learned Feature Extraction for Acoustic Scene Classification

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TASLP Volume 27 Issue 8

Constrained Learned Feature Extraction for Acoustic Scene Classification

TASLPRO Featured Articles

By:

Teng Zhang; Ji Wu

Deep neural networks (DNNs) have been proven to be powerful models for acoustic scene classification tasks. State-of-the-art DNNs have millions of connections and are computationally intensive, making them difficult to deploy on systems with limited resources. With a focus on acoustic scene classification, we describe a new learnable module, the simulated Fourier transform module, which allows deep neural networks to implement the discrete Fourier transform operation 8x faster on a graphics processing unit (GPU). We frame the signal processing procedure as an adaptive machine learning problem and introduce learnable parameters in the module to facilitate fast adaptation for the complex and variable acoustic signal. This module gives neural networks the ability to model audio signals from raw waveforms, without extra fast Fourier transform and filter bank patches. Then, we use the temporal transformer module, which has been previously published, to alleviate the information loss caused by the simulated Fourier transform module. These techniques can be integrated into an existing fully connected neural network (FCNN), convolutional neural network (CNN), or recurrent neural network (RNN) models. We evaluate the proposed strategy using four acoustic scene datasets (LITIS Rouen, DCASE2016, DCASE2017, and DCASE2018) as target tasks. We show that the proposed approach significantly outperforms the vanilla FCNN, CNN, and RNN approach on both efficiency and performance. For instance, the proposed approach can reduce inference time by 8x while reducing the classification error on LITIS Rouen dataset from 3.21% to 1.81%.

Read on IEEE Xplore

Tags:

IEEE TASLP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

nominate.jpg

Call for Nominations for Fellow Evaluation Committee Member Extended

webinar_IFSTC_general.jpg

SPS Webinar: Temporal Context Mining for Learned Video Compression

webinar_blog_nl_lg.jpg

SPS ISAC-TWG Webinar: Sensing with Random Communication Signals

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Constrained Learned Feature Extraction for Acoustic Scene Classification

Publications & Resources

For Authors

nominate.jpg

2025 Certified Chapter Banner (iStock-861165876) (1).jpg

general_get_involved_tc_article_full.jpg

Top Reasons to Join SPS Today!

Constrained Learned Feature Extraction for Acoustic Scene Classification

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Constrained Learned Feature Extraction for Acoustic Scene Classification

Search form

You are here

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Constrained Learned Feature Extraction for Acoustic Scene Classification

SPS Social Media

IEEE SPS Educational Resources