Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TMM Volume 22 Issue 1

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

By:

S. Chandrakala; S. L. Jayalakshmi

The analysis of sound information is helpful for audio surveillance, multimedia information retrieval, audio tagging, and forensic applications. Environmental audio scene recognition (EASR) and sound event recognition (SER) for audio surveillance are challenging tasks due to the presence of multiple sound sources, background noises, and the existence of overlapping or polyphonic contexts. We focus on learning robust and compact representations for environmental audio scenes and sound events using mel-frequency cepstral coefficients as basic features, which have proved to be effective in speech and audio-related tasks. In this paper, we propose a common hybrid model-based framework that learns representations with the help of generative models. We explore instance-specific adapted Gaussian mixture models for environmental audio scenes and instance-specific hidden Markov models for sound events to compute a robust, compact, and discriminatory representations. A discriminative model based classifier is then used to recognize these representations as environmental audio scenes and sound events. The performance of the proposed approaches is evaluated using the DCASE2013 scene dataset and TUT-DCASE2016 scene dataset for EASR task. Environmental Sound Classification (ESC-10) and UrbanSound8K datasets are used for SER task. The recognition accuracy of the proposed framework is significantly better than many of the state-of-the-art approaches proposed in the recent literature. The discriminative nature of the model-driven representations leads to improved efficiency for EASR and SER task. The proposed approaches are more suitable for tasks with less training data.

Read on IEEE Xplore

Tags:

IEEE TMM Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

webinar_cube.jpg

SPS BSI Webinar: NeuroAI: From HoloBrain to HoloGraph

close-up-of-fiber-optic-cables-2024-11-03-07-51-25-utc.jpg

Waveforms for Computing Over the Air: A groundbreaking approach that redefines data aggregation

book-background-old-books-in-the-library-bookshe-2025-03-10-11-04-10-utc.jpg

Ode to Masterfully Written Textbooks: And remembering Simon Haykin [From the Editor]

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

Transactions on Multimedia

Publications & Resources

For Authors

Meet the Candidates (870 x 350 px).png

Election_Results.jpg

TMM.png

Top Reasons to Join SPS Today!

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

SPS Social Media

IEEE SPS Educational Resources

webinar_cube.jpg

SPS BSI Webinar: NeuroAI: From HoloBrain to HoloGraph

close-up-of-fiber-optic-cables-2024-11-03-07-51-25-utc.jpg

Waveforms for Computing Over the Air: A groundbreaking approach that redefines data aggregation

book-background-old-books-in-the-library-bookshe-2025-03-10-11-04-10-utc.jpg

Ode to Masterfully Written Textbooks: And remembering Simon Haykin [From the Editor]

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

Search form

You are here

Transactions on Multimedia

Publications & Resources

For Authors

Meet the Candidates (870 x 350 px).png

Election_Results.jpg

TMM.png

Top Reasons to Join SPS Today!

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

SPS Social Media

IEEE SPS Educational Resources