1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Self-supervised representation learning (SSRL) methods aim to provide powerful, deep feature learning without the requirement of large annotated data sets, thus alleviating the annotation bottleneck-one of the main barriers to the practical deployment of deep learning today. These techniques have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pretraining alternatives across a variety of data modalities, including image, video, sound, text, and graphs. This article introduces this vibrant area, including key concepts, the four main families of approaches and associated state-of-the-art techniques, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and computational cost. Finally, we survey major open challenges in the field, that provide fertile ground for future work.
Deep neural networks (DNNs) now underpin state-of-the-art artificial intelligence (AI) systems for analysis of diverse data types. However, the conventional paradigm has been to train these systems using supervised learning, where performance has grown roughly logarithmically with annotated data set sizes. The cost of such annotation has proven to be a scalability bottleneck for the continued advancement of state-of-the-art performance, and a more fundamental barrier for the deployment of DNNs in application areas where data and annotations are intrinsically rare, costly, dangerous, or time consuming to collect.
This situation has motivated a wave of research in SSRL, where freely available labels from carefully designed pretext tasks are used as supervision to discriminatively train deep representations. The resulting representations can then be reused for training a DNN to solve a downstream task of interest using comparatively little task-specific annotated data compared to conventional supervised learning.
Self-supervision refers to learning tasks that ask a DNN to predict one part of the input data—or a label programmatically derivable thereof—given another part of the input. This is in contrast to supervised learning, which asks the DNN to predict a manually provided target output, and generative modeling, which asks a DNN to estimate the density of the input data or learn a generator for input data. Self-supervised algorithms differ primarily in their strategy for defining the derived labels to predict. This choice of pretext task determines the (in)variances of the resulting learned representation and thus how effective it is for different downstream tasks.