Skip to main content

JSTSP Volume 16 Issue 6

<p>JSTSP Volume 16 Issue 6</p>

Issue Title
Self-Supervised Learning for Speech and Audio Processing

Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased attention in the automatic speech recognition (ASR) community. Typical SSL methods include autoregressive predictive coding (APC), Wav2vec2.0, and hidden unit BERT (HuBERT). However, SSL models are biased to the pretraining data. When SSL models are finetuned with data from another domain, domain shifting occurs and might cause limited knowledge transfer for downstream tasks.

Read more

Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models

Speech self-supervised learning has attracted much attention due to its promising performance in multiple downstream tasks, and has become a new growth engine for speech recognition in low-resource languages. In this paper, we exploit and analyze a series of wav2vec pre-trained models for speech recognition in 15 low-resource languages in the OpenASR21 Challenge.

Read more

Self-Supervised Language Learning From Raw Audio: Lessons From the Zero Resource Speech Challenge

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. 

Read more

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. 

Read more

Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing

The papers in this special section focus on self-supervised learning for speech and audio processing. A current trend in the machine learning community is the adoption of self-supervised approaches to pretrain deep networks. Self-supervised learning utilizes proxy-supervised learning tasks (or pretext tasks) - for example, distinguishing parts of the input signal from distractors or reconstructing masked input segments conditioned on unmasked segments—to obtain training data from unlabeled corpora. 

Read more