Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TASLP Volume 31 | 2023

Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

TASLPRO Articles

By:

Mrinmoy Bhattacharjee; S. R. M. Prasanna; Prithwijit Guha

Detection of speech and music signals in isolated and overlapped conditions is an essential preprocessing step for many audio applications. Speech signals have wavy and continuous harmonics, while music signals exhibit horizontally linear and discontinuous harmonic patterns. Music signals also contain more percussive components than speech signals, manifested as vertical striations in the spectrograms. In case of speech music overlap, it might be challenging for automatic feature learning systems to extract class-specific horizontal and vertical striations from the combined spectrogram representation. A pre-processing step of separating the harmonic and percussive components before training might aid the classifier. Thus, this work proposes the use of harmonic-percussive source separation method to generate features for better detection of speech and music signals. Additionally, this work also explores the traditional and cascaded-information multi-task learning (MTL) frameworks to design better classifiers. MTL framework aids the training of the main task by employing simultaneous learning of several related auxiliary tasks. Results have been reported both on synthetically generated speech music overlapped signals and real recordings. Four state-of-the-art approaches are used for performance comparison. Experiments show that harmonic and percussive decomposition of spectrograms perform better as features. Moreover, the MTL-framework based classifiers further improve performances.

Speech and music are the most frequently encountered audio categories in movies, TV shows, web series, and radio broadcasts. Researchers have been tackling the problem of speech vs. music classification for a long time now. State-of-the-art methods [1]–[4] can identify isolated speech and music segments with impressive accuracy. However, speech and music are often found as overlapping mixtures in most practical scenarios. For example, sentimental scenes in movies and TV shows frequently have speech with background music to highlight the scene’s mood. If such segments are not identified beforehand and processed separately, these may disrupt the performance of high-level applications like automatic speech recognition and music information retrieval. Hence, this work focuses on discriminating isolated speech and music segments from their overlapping mixtures.

Read on IEEE Xplore

Tags:

IEEE TASLP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

webinar_IFSTC_general.jpg

SPS SPTM TC Webinar: Unlimited Sensing: Redefining Digital Acquisition, Representation and Signal Processing

congratulations.jpg

Congratulations to Signal Processing Society Members Elevated to Senior Members!

MLSP-2027.jpg

2027 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2027)

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

Publications & Resources

For Authors

congratulations.jpg

CAI_2027_Call_for_Proposals.png

pod .png

Top Reasons to Join SPS Today!

Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

Search form

You are here

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

SPS Social Media

IEEE SPS Educational Resources