Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

JSTSP Volume 16 Issue 6

Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

JSTSP Articles

By:

Ruchao Fan; Yunzheng Zhu; Jinhan Wang; Abeer Alwan

Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased attention in the automatic speech recognition (ASR) community. Typical SSL methods include autoregressive predictive coding (APC), Wav2vec2.0, and hidden unit BERT (HuBERT). However, SSL models are biased to the pretraining data. When SSL models are finetuned with data from another domain, domain shifting occurs and might cause limited knowledge transfer for downstream tasks. In this paper, we propose a novel framework, domain responsible adaptation and finetuning (DRAFT), to reduce domain shifting in pretrained speech models, and evaluate it for a causal and non-causal transformer. For the causal transformer, an extension of APC (E-APC) is proposed to learn richer information from unlabelled data by using multiple temporally-shifted sequences to perform prediction. For the non-causal transformer, various solutions for using the bidirectional APC (Bi-APC) are investigated. In addition, the DRAFT framework is examined for Wav2vec2.0 and HuBERT methods, which use non-causal transformers as the backbone. The experiments are conducted on child ASR (using the OGI and MyST databases) using SSL models trained with unlabelled adult speech data from Librispeech. The relative WER improvements of up to 19.7% on the two child tasks are observed when compared to the pretrained models without adaptation. With the proposed methods (E-APC and DRAFT), the relative WER improvements are even larger (30% and 19% on the OGI and MyST data, respectively) when compared to the models without using pretraining methods.

Despite impressive advancement in developing automatic speech recognition (ASR) techniques in the last decade, children's ASR remains difficult. Challenges arise, in part, from difficulties in acoustic and language modeling of child speech. Due to different growth patterns of children and motor control issues, child speech has a higher degree of intra-speaker and inter-speaker acoustic variability than adult speech [1]. Additionally, child speech is characterized by significant mispronunciations and disfluencies [2], [3]. Another challenge is the lack of large-scale publicly-available child speech databases, and thus child ASR can be treated as a low-resource task [4].

Read on IEEE Xplore

Tags:

IEEE JSTSP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

webinar_cube.jpg

SPS BSI Webinar: Unlocking Precision Mental Health with Data-Driven Neuroimaging Biomarkers

multimedia_general.jpg

2025 Cycle 1 Chapter Initiative: DecodeX: A Comprehensive Signal Processing Experience

SP-Society-Name-Change-Forum.jpg

2025 Cycle 1 IEEE SPS Forum on IGNITE : A PhD Forum and PG Poster Presentation 2.0

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

Journal of Selected Topics in Signal Processing

Publications & Resources

For Authors

light_bulb_general.jpg

lrac2025_vertical_text (2).png

congratulations.jpg

Top Reasons to Join SPS Today!

Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

Search form

You are here

Journal of Selected Topics in Signal Processing

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

SPS Social Media

IEEE SPS Educational Resources