A Novel Modified Mel-DCT Filter Bank Structure With Application to Voice Activity Detection

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

A Novel Modified Mel-DCT Filter Bank Structure With Application to Voice Activity Detection

By: 
R. Muralishankar; Debayan Ghosh; Sanjeev Gurugopinath

We propose a novel modified Mel-discrete cosine transform (MMD) filter bank structure, which restricts the overlap of each filter response to its immediate neighbor. In contrast to the well-known triangular filters employed in the extraction of the Mel-frequency cepstral coefficients (MFCC), the proposed filter structure has a smoother response and offers discrete cosine transformation and Mel-scale filtering in a single operation. It is known that the choice of MFCC as the only feature for voice activity detection (VAD) does not yield substantial improvements in the performance. Even with the long-term approach, we observe a not so encouraging VAD performance when MFCC features are employed. However, other long-term based VAD algorithms – without MFCC - are known to provide a substantial improvement in the performance under low SNR with time-varying statistics of speech and/or noise. In this work, we show that by employing the MMD followed by the long-term differential entropy of voice signal for VAD provides significant improvements in detection accuracy when compared with the other well-known long-term algorithms. Thus, this study opens up the possible benefits of the proposed MMD filter bank for other speech processing applications. 

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel