# Deep Learning Meets Sparse Regularization: A signal processing perspective

## You are here

### Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits

## signal_general.jpg

By:
Rahul Parhi; Robert D. Nowak

Deep learning (DL) has been wildly successful in practice, and most of the state-of-the-art machine learning methods are based on neural networks (NNs). Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep NNs (DNNs). In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of DL. This framework precisely characterizes the functional properties of NNs that are trained to fit to data. The key mathematical tools that support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory, which are all techniques deeply rooted in signal processing. This framework explains the effect of weight decay regularization in NN training, use of skip connections and low-rank weight matrices in network architectures, role of sparsity in NNs, and explains why NNs can perform well in high-dimensional problems.

### Introduction

DL has revolutionized engineering and the sciences in the modern data age. The typical goal of DL is to predict an output yY (e.g., a label or response) from an input xX (e.g., a feature or example). An NN is “trained” to fit to a set of data consisting of the pairs {(xn,yn)}Nn=1 by finding a set of NN parameters θ so that the NN mapping closely matches the data. The trained NN is a function, denoted by fθ:XY, that can be used to predict the output yY of a new input xX. This paradigm is referred to as supervised learning, which is the focus of this article. The success of DL has spawned a burgeoning industry that is continually developing new applications, NN architectures, and training algorithms. This article reviews recent developments in the mathematics of DL, focused on the characterization of the kinds of functions learned by NNs fit to data. There are currently many competing theories that explain the success of DL. These developments are part of a wider body of theoretical work that can be crudely organized into three broad categories: 1) approximation theory with NNs, 2) the design and analysis of optimization (“training”) algorithms for NNs, and 3) characterizations of the properties of trained NNs.

### SPS on Twitter

• DEADLINE EXTENDED: The 2023 IEEE International Workshop on Machine Learning for Signal Processing is now accepting… https://t.co/NLH2u19a3y
• ONE MONTH OUT! We are celebrating the inaugural SPS Day on 2 June, honoring the date the Society was established in… https://t.co/V6Z3wKGK1O
• The new SPS Scholarship Program welcomes applications from students interested in pursuing signal processing educat… https://t.co/0aYPMDSWDj
• CALL FOR PAPERS: The IEEE Journal of Selected Topics in Signal Processing is now seeking submissions for a Special… https://t.co/NPCGrSjQbh
• Test your knowledge of signal processing history with our April trivia! Our 75th anniversary celebration continues:… https://t.co/4xal7voFER

### SPS Videos

Signal Processing in Home Assistants

Multimedia Forensics

Careers in Signal Processing