Self-Supervised Speech Representation Learning: A Review

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Self-Supervised Speech Representation Learning: A Review

By: 
Abdelrahman Mohamed; Hung-yi Lee; Lasse Borgholt; Jakob D. Havtorn; Joakim Edin; Christian Igel; Katrin Kirchhoff; and more...

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition.

SPS on Twitter

  • New SPS Webinar: On Wednesday, 8 February, join Dr. Roula Nassif for "Decentralized learning over multitask graphs"… https://t.co/GOgHb7vfAv
  • CALL FOR PAPERS: IEEE Signal Processing Magazine welcomes submissions for a Special Issue on Hypercomplex Signal an… https://t.co/UDvjUY2llT
  • New SPS Webinar: On 15 February, join Mr. Wei Liu, Dr. Li Chen and Dr. Wenyi Zhang presenting "Decentralized Federa… https://t.co/em0sQAK4V5
  • New SPS Webinar: On Monday, 13 February, join Dr. Joe (Zhou) Ren when he presents "Human Centric Visual Analysis -… https://t.co/Rc39HpkPKr
  • Help us illustrate the SPS story! In honor of our 75th anniversary, we need your support to capture the people, mem… https://t.co/MnYU9MzIok

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar