Self-Supervised Representation Learning: Introduction, advances, and challenges

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Self-Supervised Representation Learning: Introduction, advances, and challenges

By: 
Linus Ericsson; Henry Gouk; Chen Change Loy; Timothy M. Hospedales

Self-supervised representation learning (SSRL) methods aim to provide powerful, deep feature learning without the requirement of large annotated data sets, thus alleviating the annotation bottleneck-one of the main barriers to the practical deployment of deep learning today. These techniques have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pretraining alternatives across a variety of data modalities, including image, video, sound, text, and graphs. This article introduces this vibrant area, including key concepts, the four main families of approaches and associated state-of-the-art techniques, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and computational cost. Finally, we survey major open challenges in the field, that provide fertile ground for future work.

Deep neural networks (DNNs) now underpin state-of-the-art artificial intelligence (AI) systems for analysis of diverse data types. However, the conventional paradigm has been to train these systems using supervised learning, where performance has grown roughly logarithmically with annotated data set sizes. The cost of such annotation has proven to be a scalability bottleneck for the continued advancement of state-of-the-art performance, and a more fundamental barrier for the deployment of DNNs in application areas where data and annotations are intrinsically rare, costly, dangerous, or time consuming to collect.

This situation has motivated a wave of research in SSRL, where freely available labels from carefully designed pretext tasks are used as supervision to discriminatively train deep representations. The resulting representations can then be reused for training a DNN to solve a downstream task of interest using comparatively little task-specific annotated data compared to conventional supervised learning.

Self-supervision refers to learning tasks that ask a DNN to predict one part of the input data—or a label programmatically derivable thereof—given another part of the input. This is in contrast to supervised learning, which asks the DNN to predict a manually provided target output, and generative modeling, which asks a DNN to estimate the density of the input data or learn a generator for input data. Self-supervised algorithms differ primarily in their strategy for defining the derived labels to predict. This choice of pretext task determines the (in)variances of the resulting learned representation and thus how effective it is for different downstream tasks.

SPS on Twitter

  • This Wednesday, join the Information Forensics and Security Technical Committee Webinar Series when Dr. Richard Heu… https://t.co/ORdtuq5SlQ
  • Our Biomedical Imaging and Signal Processing Webinar Series continues on Tuesday, 5 July when Michael Unser present… https://t.co/7bYh8ZPHI0
  • Join us TODAY at 11:00 AM ET when the Brain Space Initiative Talk Series continues with Dr. Tianming Liu presenting… https://t.co/MEfnzk6dAE
  • Our 75th anniversary is approaching in 2023, and we're celebrating with a Special Issue of IEEE Signal Processing M… https://t.co/U6UNv8kLSO
  • The SPS Webinar Series continues on Monday, 20 June when Dr. Zhijin Qin presents "Semantic Communications: Principl… https://t.co/FhI7aP3GLi

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar