1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Deep learning (DL) has been wildly successful in practice, and most of the state-of-the-art machine learning methods are based on neural networks (NNs). Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep NNs (DNNs). In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of DL. This framework precisely characterizes the functional properties of NNs that are trained to fit to data. The key mathematical tools that support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory, which are all techniques deeply rooted in signal processing. This framework explains the effect of weight decay regularization in NN training, use of skip connections and low-rank weight matrices in network architectures, role of sparsity in NNs, and explains why NNs can perform well in high-dimensional problems.
DL has revolutionized engineering and the sciences in the modern data age. The typical goal of DL is to predict an output
This article belongs to the latter category of research and investigates the functional properties (i.e., the regularity) of solutions to NN training problems with explicit, Tikhonov-type regularization. Although much of the success of DL in practice comes from networks with highly structured architectures, it is hard to establish a rigorous and unified theory for such NNs used in practice. Therefore, we primarily focus on fully connected, feedforward NNs with the popular rectified linear unit (ReLU) activation function. This article introduces a mathematical framework that unifies a line of work from several authors over the last few years that sheds light on the nature and behavior of NN functions that are trained to a global minimizer with explicit regularization. The presented results are just one piece of the puzzle toward developing a mathematical theory of DL. The purpose of this article is, in particular, to provide a gentle introduction to this new mathematical framework, accessible to readers with a mathematical background in signals and systems and applied linear algebra. The framework is based on mathematical tools familiar to the signal processing community, including transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory. It is also related to well-known signal processing ideas such as wavelets, splines, and compressed sensing.