Deep Learning Meets Sparse Regularization: A signal processing perspective

September 2023

Deep Learning Meets Sparse Regularization: A signal processing perspective

signal_general.jpg

By:

Rahul Parhi; Robert D. Nowak

Deep learning (DL) has been wildly successful in practice, and most of the state-of-the-art machine learning methods are based on neural networks (NNs). Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep NNs (DNNs). In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of DL. This framework precisely characterizes the functional properties of NNs that are trained to fit to data. The key mathematical tools that support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory, which are all techniques deeply rooted in signal processing. This framework explains the effect of weight decay regularization in NN training, use of skip connections and low-rank weight matrices in network architectures, role of sparsity in NNs, and explains why NNs can perform well in high-dimensional problems.

Introduction

DL has revolutionized engineering and the sciences in the modern data age. The typical goal of DL is to predict an output y∈Y (e.g., a label or response) from an input x∈X (e.g., a feature or example). An NN is “trained” to fit to a set of data consisting of the pairs {(xn,yn)}Nn=1 by finding a set of NN parameters θ so that the NN mapping closely matches the data. The trained NN is a function, denoted by fθ:X→Y, that can be used to predict the output y∈Y of a new input x∈X. This paradigm is referred to as supervised learning, which is the focus of this article. The success of DL has spawned a burgeoning industry that is continually developing new applications, NN architectures, and training algorithms. This article reviews recent developments in the mathematics of DL, focused on the characterization of the kinds of functions learned by NNs fit to data. There are currently many competing theories that explain the success of DL. These developments are part of a wider body of theoretical work that can be crudely organized into three broad categories: 1) approximation theory with NNs, 2) the design and analysis of optimization (“training”) algorithms for NNs, and 3) characterizations of the properties of trained NNs.

This article belongs to the latter category of research and investigates the functional properties (i.e., the regularity) of solutions to NN training problems with explicit, Tikhonov-type regularization. Although much of the success of DL in practice comes from networks with highly structured architectures, it is hard to establish a rigorous and unified theory for such NNs used in practice. Therefore, we primarily focus on fully connected, feedforward NNs with the popular rectified linear unit (ReLU) activation function. This article introduces a mathematical framework that unifies a line of work from several authors over the last few years that sheds light on the nature and behavior of NN functions that are trained to a global minimizer with explicit regularization. The presented results are just one piece of the puzzle toward developing a mathematical theory of DL. The purpose of this article is, in particular, to provide a gentle introduction to this new mathematical framework, accessible to readers with a mathematical background in signals and systems and applied linear algebra. The framework is based on mathematical tools familiar to the signal processing community, including transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory. It is also related to well-known signal processing ideas such as wavelets, splines, and compressed sensing.

Read on IEEE Xplore

Tags:

SPM Article September 2023

spm_general.png

PhD Thesis: Computational Foundations of Multi-Agent Learning in Cyber-Physical-Human Networks under Amorphous Information Attributes

Bjorn_Schuller.jpg

Distinguished Lecturer: Björn W. Schuller

Morteza_Mardani_lecture.jpg

Distinguished Indsutry Speaker: Morteza Mardani ( NVIDIA Research, USA)

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Deep Learning Meets Sparse Regularization: A signal processing perspective

Publications & Resources

Signal Processing Magazine

For Authors

spm_general.png

lecture_mic_general2.jpg

nominate.jpg

Top Reasons to Join SPS Today!

Deep Learning Meets Sparse Regularization: A signal processing perspective

signal_general.jpg

Introduction

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Deep Learning Meets Sparse Regularization: A signal processing perspective

Search form

You are here

Publications & Resources

Signal Processing Magazine

For Authors

Top Reasons to Join SPS Today!

Deep Learning Meets Sparse Regularization: A signal processing perspective

Introduction

SPS Social Media

IEEE SPS Educational Resources