Kaewtip, Kantapon. University of California, Los Angeles（2017）"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

June 2017

Kaewtip, Kantapon. University of California, Los Angeles（2017）"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

PhD Theses

Kaewtip, Kantapon. University of California, Los Angeles（2017）"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach", advisor: Alwan, Abeer

This dissertation focuses on robust signal processing algorithms for birdsongs and speech signals. Automatic phrase or syllable detection systems of bird sounds are useful in several applications. However, bird-phrase detection is challenging due to segmentation error, duration variability, limited training data, and background noise. Two spectrograms with identical class labels may look different due to time misalignment and frequency variation. In real recording environments such as in a forest, the data can be corrupted by background interference, such as rain, wind, other animals or even other birds vocalizing. A noise-robust classifier needs to handle such conditions. Similarly, Automatic Speech Recognition (ASR) works well in quiet environments, but a large degradation in performance is observed when the speech signal is corrupted by background noise. The ASR performance would benefit from robust representations of speech signals and from robust recognition systems.

The first topic of this dissertation focuses on an automatic birdsong-phrase recognition system that is robust to limited training data, class variability, and noise. The algorithm comprises a noise-robust Dynamic-Time-Warping (DTW)- based segmentation and a discriminative classifier for outlier rejection. The algorithm utilizes DTW and prominent (high energy) time-frequency regions of training spectrograms to derive a reliable noise-robust template for each phrase class. The resulting template is then used for segmenting continuous recordings to obtain segment candidates whose spectrogram amplitudes in the prominent regions are used as features to a Support Vector Machine (SVM). In addition, the authors present a novel approach to training HMMs with extremely limited data. First, the algorithm learns the Global Gaussian Mixture Models (GMMs) for all training phrases available. GMM parameters are then used to initialize state parameters of each individual model. The number of states and the mixture components for each state are determined by the acoustic variation of each phrase type. The (high-energy) time-frequency prominent regions are used to compute the state emitting probability to increase noise-robustness.

The second topic of the dissertation deals with noise-robust processing for automatic speech recognition. The authors also propose a new pitch-based spectral enhancement algorithm based on voiced frames for speech analysis and noise-robust speech processing. The proposed algorithm determines a time-warping function (TWF) and the speaker's pitch with high precision, simultaneously. This technique reduces the smearing effect in between harmonics when the fundamental frequency is not constant within the analysis window. To do so, the authors propose a metric called the harmonic residual which measures the difference between the actual spectrum and the resynthesized spectrum derived from the linear model of speech production with various combinations of TWF and high-precision pitch values as parameters. The TWF and pitch pair that yields the minimum harmonic residual is selected and the enhanced spectrum is obtained accordingly. The authors show how this new representation can be also used for automatic speech recognition by proposing a robust spectral representation derived from harmonic amplitude interpolation.

Open Calls

Nomination/Position	Deadline
Call for Proposals for 2026 Signal Processing Cup	17 August 2025
Call for Nominations for the IEEE Transactions on Medical Imaging (TMI) Best Paper Award	1 September 2025
Nominate a Colleague for a 2025 IEEE Signal Processing Society Award	1 September 2025
Call for Mentors: 2025 IEEE SPS SigMA Program - Signal Processing Mentorship Academy	14 September 2025
Call for Nominations: Technical Committee Vice Chair and Member Positions	15 September 2025
Take Part in the 2025 Low-Resource Audio Codec (LRAC) Challenge	1 October 2025
Call for proposals: 2027 IEEE Conference on Artificial Intelligence (CAI)	1 October 2025
Meet the 2025 Candidates: IEEE President-Elect	1 October 2025
Call for Nominations for the SPS Chapter of the Year Award	15 October 2025
Call for Project Proposals: IEEE SPS SigMA Program - Signal Processing Mentorship Academy	2 November 2025

Society News

Conferences & Events

Signal Processing Conferences

PhD Theses

Technical Committee News

webinar_cube.jpg

SPS BSI Webinar: Unlocking Precision Mental Health with Data-Driven Neuroimaging Biomarkers

multimedia_general.jpg

2025 Cycle 1 Chapter Initiative: DecodeX: A Comprehensive Signal Processing Experience

SP-Society-Name-Change-Forum.jpg

2025 Cycle 1 IEEE SPS Forum on IGNITE : A PhD Forum and PG Poster Presentation 2.0

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Kaewtip, Kantapon. University of California, Los Angeles（2017）"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

Kaewtip, Kantapon. University of California, Los Angeles（2017）"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

Open Calls

Table of Contents:

Society News

Conferences & Events

PhD Theses

Technical Committee News

Member Highlights

New Books

Education & Resources

SPS Social Media

IEEE SPS Educational Resources

webinar_cube.jpg

SPS BSI Webinar: Unlocking Precision Mental Health with Data-Driven Neuroimaging Biomarkers

multimedia_general.jpg

2025 Cycle 1 Chapter Initiative: DecodeX: A Comprehensive Signal Processing Experience

SP-Society-Name-Change-Forum.jpg

2025 Cycle 1 IEEE SPS Forum on IGNITE : A PhD Forum and PG Poster Presentation 2.0

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Kaewtip, Kantapon. University of California, Los Angeles（2017）"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

Search form

You are here

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

Kaewtip, Kantapon. University of California, Los Angeles（2017）"Robust Automatic Recognition of Birdsongs and Human Speech: a Template-Based Approach"

Open Calls

Table of Contents:

Society News

Conferences & Events

PhD Theses

Technical Committee News

Member Highlights

New Books

Education & Resources

SPS Social Media

IEEE SPS Educational Resources