Skip to main content

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

By
Peter Bell; Joachim Fainberg; Ondrej Klejch; Jinyu Li; Steve Renals; Pawel Swietojanski

We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation. The overview characterizes adaptation algorithms as based on embeddings, model parameter adaptation, or data augmentation. We present a meta-analysis of the performance of speech recognition adaptation algorithms, based on relative error rate reductions as reported in the literature.

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

The performance of automatic speech recognition (ASR) systems has improved dramatically in recent years thanks to the availability of larger training datasets, the development of neural network based models, and the computational power to train such models on these datasets [1][2][3][4]. However, the performance of ASR systems can still degrade rapidly when their conditions of use (test conditions) differ from the training data. There are several causes for this, including speaker differences, variability in the acoustic environment, and the domain of use.