Kaewtip, Kantapon. University of California, Los Angeles(2017)"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

You are here

Inside Signal Processing Newsletter Home Page

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

10 years of news and resources for members of the IEEE Signal Processing Society

Kaewtip, Kantapon. University of California, Los Angeles(2017)"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

Kaewtip, Kantapon. University of California, Los Angeles(2017)"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach", advisor: Alwan, Abeer

This dissertation focuses on robust signal processing algorithms for birdsongs and speech signals. Automatic phrase or syllable detection systems of bird sounds are useful in several applications. However, bird-phrase detection is challenging due to segmentation error, duration variability, limited training data, and background noise. Two spectrograms with identical class labels may look different due to time misalignment and frequency variation. In real recording environments such as in a forest, the data can be corrupted by background interference, such as rain, wind, other animals or even other birds vocalizing. A noise-robust classifier needs to handle such conditions. Similarly, Automatic Speech Recognition (ASR) works well in quiet environments, but a large degradation in performance is observed when the speech signal is corrupted by background noise. The ASR performance would benefit from robust representations of speech signals and from robust recognition systems.

The first topic of this dissertation focuses on an automatic birdsong-phrase recognition system that is robust to limited training data, class variability, and noise. The algorithm comprises a noise-robust Dynamic-Time-Warping (DTW)- based segmentation and a discriminative classifier for outlier rejection. The algorithm utilizes DTW and prominent (high energy) time-frequency regions of training spectrograms to derive a reliable noise-robust template for each phrase class. The resulting template is then used for segmenting continuous recordings to obtain segment candidates whose spectrogram amplitudes in the prominent regions are used as features to a Support Vector Machine (SVM). In addition, the authors present a novel approach to training HMMs with extremely limited data. First, the algorithm learns the Global Gaussian Mixture Models (GMMs) for all training phrases available. GMM parameters are then used to initialize state parameters of each individual model. The number of states and the mixture components for each state are determined by the acoustic variation of each phrase type. The (high-energy) time-frequency prominent regions are used to compute the state emitting probability to increase noise-robustness.

The second topic of the dissertation deals with noise-robust processing for automatic speech recognition. The authors also propose a new pitch-based spectral enhancement algorithm based on voiced frames for speech analysis and noise-robust speech processing. The proposed algorithm determines a time-warping function (TWF) and the speaker's pitch with high precision, simultaneously. This technique reduces the smearing effect in between harmonics when the fundamental frequency is not constant within the analysis window. To do so, the authors propose a metric called the harmonic residual which measures the difference between the actual spectrum and the resynthesized spectrum derived from the linear model of speech production with various combinations of TWF and high-precision pitch values as parameters. The TWF and pitch pair that yields the minimum harmonic residual is selected and the enhanced spectrum is obtained accordingly. The authors show how this new representation can be also used for automatic speech recognition by proposing a robust spectral representation derived from harmonic amplitude interpolation.

SPS on Twitter

  • NEW SPS WEBINAR: On Tuesday, 13 December, join Dr. Qian Huang for "Deep Learning for All-in-Focus Imaging" - regist… https://t.co/4AVCabulyP
  • Join the SPS Membership Drive on Monday, 12 December, when SPS members, potential members, and the greater signal p… https://t.co/gtbisawJIK
  • The fundraising deadline to meet our 30 unique donations of US$10 or more is tonight — increase your impact for sig… https://t.co/KTzzCKnEMO
  • Happy ! Celebrate this global day of generosity and community action with the IEEE Foundation and… https://t.co/UvaytMFnQ1
  • The SPS Biomedical Imaging and Signal Processing Technical Committee Webinar Series continues on Tuesday, 6 Decembe… https://t.co/SYEEzoxIAK

SPS Videos

Signal Processing in Home Assistants


Multimedia Forensics

Careers in Signal Processing             


Under the Radar