Learning of Continuous and Piecewise-Linear Functions With Hessian Total-Variation Regularization

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Learning of Continuous and Piecewise-Linear Functions With Hessian Total-Variation Regularization

By: 
Joaquim Campos; Shayan Aziznejad; Michael Unser

We develop a novel 2D functional learning framework that employs a sparsity-promoting regularization based on second-order derivatives. Motivated by the nature of the regularizer, we restrict the search space to the span of piecewise-linear box splines shifted on a 2D lattice. Our formulation of the infinite-dimensional problem on this search space allows us to recast it exactly as a finite-dimensional one that can be solved using standard methods in convex optimization. Since our search space is composed of continuous and piecewise-linear functions, our work presents itself as an alternative to training networks that deploy rectified linear units, which also construct models in this family. The advantages of our method are fourfold: the ability to enforce sparsity, favoring models with fewer piecewise-linear regions; the use of a rotation, scale and translation-invariant regularization; a single hyperparameter that controls the complexity of the model; and a clear model interpretability that provides a straightforward relation between the parameters and the overall learned function. We validate our framework in various experimental setups and compare it with neural networks.

The primary task in supervised learning is to estimate a target function f:RdR from finitely many noisy samples {xm,ym}Mm=1, where ymf(xm),m=1,,M [1]. Since there are arbitrarily many continuous models that can fit the training data well enough, this problem is ill-posed in general. To address this issue, the learning scheme generally includes regularization and favors certainmodels based on prior information on the target function [2][3].

One way to make this problem computationally tractable is to restrict the admissible solutions to a given family of parametric functions fΘ, where Θ denotes the vector of the underlying parameters. A celebrated example of this approach is deep learning, whose underlying principle is the construction of an overall map fΘ:RdR built as a neural network via the composition of parameterized affine mappings and pointwise nonlinearities known as activation functions. The attribute “deep” refers to the high number of such module compositions (layers), which is instrumental to improve the approximation power of the network [4][5][6] and its generalization ability [7].

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel