Robust Subspace Learning

July 2018

Robust Subspace Learning

0718_A5_spm.jpg

By:

Namrata Vaswani, Thierry Bouwmans, Sajid Javed, Praneeth Narayanamurthy

Principal component analysis (PCA) is one of the most widely used dimension reduction techniques. A related easier problem is termed subspace learning or subspace estimation. Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning (RSL) or robust PCA (RPCA). For long data sequences, if one tries to use a single lower-dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of RSL and tracking. In particular, solutions for three problems are discussed in detail:

RPCA via sparse+low-rank (S+LR) matrix decomposition
RST via S+LR
robust subspace recovery (RSR).

RSR assumes that an entire data vector is either an outlier or an inlier. The S+LR formulation instead assumes that outliers occur on only a few data vector indices and, hence, are well modeled as sparse corruptions.

PCA finds application in a variety of scientific and data analytics’ problems ranging from exploratory data analysis and classification, e.g., image retrieval or face recognition, to a variety of modern applications such as video analytics, recommendation system design, and understanding social networks

dynamics. PCA finds a small number of orthogonal basis vectors, called principal components (PCs), along which most of the variability of the data set lies. In most applications, only the subspace spanned by the PCs (principal subspace) is of interest. For example, this is all that is needed for dimension reduction. This easier problem is typically called subspace learning or subspace estimation: the two terms are sometimes used interchangeably. Given a matrix of clean data, PCA is easily accomplished via SVD on the data matrix. The same problem becomes much harder if the data is corrupted by even a few outliers. The reason is that SVD is sensitive to outliers. We show an example of this in Figure 1. In today’s big data age, since data is often acquired using a large number of inexpensive sensors, outliers are becoming even more common. They occur due to various reasons such as node or sensor failures, foreground occlusion of video sequences, or abnormalities or other anomalous behavior on certain nodes of a network. The harder problem of PCA or subspace learning for outlier corrupted data is called RPCA or RSL.

Since the term outlier does not have a precise mathematical meaning, the RPCA problem was, until recently, not well defined. Even so, many classical heuristics existed for solving it, e.g., see [1] and [2] and references therein. In recent years, there have been multiple attempts to qualify this term. Most popular among these is the idea of treating an outlier as an additive sparse corruption, which was popularized in the work of Wright and Ma [3]. This is a valid definition because it models the fact that outliers occur infrequently and allows them to have any magnitude. In particular, their magnitude can be much larger than that of the true data points. Using this definition, the recent work by Candès et al. [4] defined RPCA as the problem of decomposing a given data matrix, M, into the sum of a low-rank matrix, L, whose column subspace gives the PCs and a sparse matrix (outliers’ matrix), S. This definition, which is often referred to as the S+LR formulation, has led to numerous interesting new works on RPCA solutions, many of which are provably correct under simple assumptions, e.g., [4]–[10]. A key motivating application is video analytics: decomposing a video into a slowly changing background video and a sparse foreground video [2], [4]; see an example in Figure 2(a). The background changes slowly, and the changes are usually dense (not sparse). It is thus well modeled as a dense vector that lies in the low-dimensional subspace of the original space [4]. The foreground usually consists of one or more moving objects, and is thus correctly modeled as the sparse outlier.

Read on IEEE Xplore

Tags:

SPM Article

BISP_TC_Webinar.jpg

SPS Webinar: An Anomaly Detection Framework with Compressed Transformer Architecture for Tiny ML

webinar_ASI.jpg

SPS Webinar: Presentation Attack Detection on ID Cards

webinar_general_text.jpg

SPS Webinar: Bilinear Expectation Propagation for Distributed Semi-Blind Joint Channel Estimation and Data Detection in Cell-Free Massive MIMO

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Robust Subspace Learning

Publications & Resources

Signal Processing Magazine

For Authors

SP-Magazine-Front_Cover-March-2025.jpg

CAI_2027_Call_for_Proposals.png

nominate_2_general.jpg

Top Reasons to Join SPS Today!

Robust Subspace Learning

0718_A5_spm.jpg

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Robust Subspace Learning

Search form

You are here

Publications & Resources

Signal Processing Magazine

For Authors

Top Reasons to Join SPS Today!

Robust Subspace Learning

SPS Social Media

IEEE SPS Educational Resources