What Should We Learn? Rethinking PCA for Modern Data Sets

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

September 2018

What Should We Learn? Rethinking PCA for Modern Data Sets

General_SP.jpg

Publications News

By:

Yang Li

In today’s big and messy data age, there is a lot of data generated everywhere around us. Examples include texts, tweets, network traffic, changing Facebook connections, or video surveillance feeds coming in from one or multiple cameras. Dimension reduction and noise/outlier removal are usually important preprocessing steps before any high-dimensional (big) data set can be used for inference. A common way to do this is via solving the principal component analysis (PCA) problemor its robust extensions. The basic PCA problem has been studied for over a century since the early work by Pearson in 1901 and Hotelling in 1933. The aim of PCA is to reduce the dimensionality of multivariate data while preserving as much of the relevant information as possible. It is often the first step in various types of exploratory data analysis, predictive modeling, and classification and clustering tasks, and finds applications in biomedical imaging, computer vision, process fault detection, recommendation systems’ design, and many more domains.

“PCA” refers to the following problem. Given a data set (a set of data vectors, or, more generally a set of data “tensors”) and a dimension k, find the k-dimensional subspace that “best” approximates the given data set. There are various notions of “best”; the traditional one used for classical PCA is either minimum Frobenius norm or minimum spectral norm of the approximation error of the data matrix. PCA, without constraints, and for clean data, is a solved problem. By the Eckart–Young–Mirsky theorem, computing the top k left singular vectors of the data matrix returns the PCA solution. On the other hand, robust PCA, which refers to the problem of PCA in the presence of outliers, is a much harder problem and one for which provably correct solutions have started appearing only recently. The same is true for dynamic PCA (subspace tracking or streaming PCA), dynamic or recursive robust PCA (robust subspace tracking), PCA and subspace tracking with missing data, and the related low-rank matrix completion problem, as well as for sparse PCA. Sparse PCA refers to the PCA problem when the principal components are assumed to be sparse. In fact, even the classical PCA problem with speed or memory constraints is not well understood.

The above issues have become particularly important for modern data sets because 1) the data matrix is often so large that it cannot be directly stored in the computer’s memory (need for streaming solutions); 2) a lot of data consist of missing entries and/or outliercorrupted entries (need for matrix completion and robust versions of PCA and subspace recovery); 3) a lot of data arrive sequentially, the data subspace itself may change over time, the entire data set cannot be stored but short batches can be, and decisions are often needed in real time or near real time (need for dynamic PCA and robust PCA and subspace tracking); 4) data are often distributed and stored over multiple locations and one needs to perform PCA without communicating all the data to a central location (need for distributed PCA); and 5) many types of data are better represented as a tensor data set rather than a vector data set or matrix (need for tensor PCA).

PCA and its many extensions are used everywhere as summarized above. Moreover, PCA is also a key intermediate or initialization step in solving many convex and nonconvex optimization problems. All areas of electrical engineering and computer science (EECS) have benefited hugely from, and have contributed significantly to, solutions of PCA and extensions. Most people in EECS know certain aspects of PCA, not all, and the special issue Rethinking PCA for Modern Data Sets: Theory, Algorithms, and Applications published by Proceedings of the IEEE in August 2018, helps bridge the gap in knowledge for EECS researchers from various backgrounds.

Open Calls

Nomination/Position	Deadline
Call for Proposals for 2026 Signal Processing Cup	17 August 2025
Call for Nominations for the IEEE Transactions on Medical Imaging (TMI) Best Paper Award	1 September 2025
Nominate a Colleague for a 2025 IEEE Signal Processing Society Award	1 September 2025
Call for Mentors: 2025 IEEE SPS SigMA Program - Signal Processing Mentorship Academy	14 September 2025
Call for Nominations: Technical Committee Vice Chair and Member Positions	15 September 2025
Call for Papers for ICASSP 2026 Now Open!	17 September 2025
Call for Nominations: Awards Board, Industry Board and Nominations & Elections Committee	19 September 2025
Meet the 2025 Candidates: IEEE President-Elect	1 October 2025
Take Part in the 2025 Low-Resource Audio Codec (LRAC) Challenge	1 October 2025
Call for proposals: 2027 IEEE Conference on Artificial Intelligence (CAI)	1 October 2025
Call for Nominations for the SPS Chapter of the Year Award	15 October 2025
Submit a Proposal for ICASSP 2030	31 October 2025
Call for Project Proposals: IEEE SPS SigMA Program - Signal Processing Mentorship Academy	2 November 2025

ieeee-sps-logo-social.png

2030 IEEE International Conferences on Acoustics, Speech, and Signal Processing (ICASSP 2030)

Congratulations Image (1).png

SPS Members Receive 2026 IEEE Technical Field Awards!

congratulations.jpg

Congratulations to Signal Processing Society Members Elevated to Senior Members!

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

What Should We Learn? Rethinking PCA for Modern Data Sets

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

What Should We Learn? Rethinking PCA for Modern Data Sets

General_SP.jpg

Open Calls

Table of Contents:

Society News

Technical Committee News

Publications News

Conferences & Events

Education & Resources

SPS Social Media

IEEE SPS Educational Resources

ieeee-sps-logo-social.png

2030 IEEE International Conferences on Acoustics, Speech, and Signal Processing (ICASSP 2030)

Congratulations Image (1).png

SPS Members Receive 2026 IEEE Technical Field Awards!

congratulations.jpg

Congratulations to Signal Processing Society Members Elevated to Senior Members!

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

What Should We Learn? Rethinking PCA for Modern Data Sets

Search form

You are here

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

What Should We Learn? Rethinking PCA for Modern Data Sets

General_SP.jpg

Open Calls

Table of Contents:

Society News

Technical Committee News

Publications News

Conferences & Events

Education & Resources

SPS Social Media

IEEE SPS Educational Resources