Critical regularization strength above which solution is 0¶
This tutorial shows that for `\lambda \geq \lambda_{\text{max}} = || \nabla f(0) ||_{\infty}`, the solution to `\min f(x) + \lambda || x ||_1` is 0.
In skglm, we thus frequently use
alpha_max = np.max(np.abs(gradient0))
and choose for the regularization strength :math:alpha a fraction of this critical value, e.g. alpha = 0.01 * alpha_max
.
Problem setup¶
Consider the optimization problem:
where:
`f: \mathbb{R}^d \to \mathbb{R}` is a convex differentiable function,
`|| x ||_1` is the L1 norm of `x`,
`\lambda > 0` is the regularization parameter.
We aim to determine the conditions under which the solution to this problem is `x = 0`.
Theoretical background¶
Let
According to Fermat’s rule, 0 is the minimizer of `g` if and only if 0 is in the subdifferential of `g` at 0. The subdifferential of `|| x ||_1` at 0 is the L-infinity unit ball:
Thus,
We have just shown that the minimizer of `g = f + \lambda || \cdot ||_1` is 0 if and only if `\lambda \geq ||\nabla f(0)||_{\infty}`.
Example¶
Consider the loss function for Ordinary Least Squares `f(x) = \frac{1}{2n} ||Ax - b||_2^2`, where `n` is the number of samples. We have:
At `x=0`:
The infinity norm of the gradient at 0 is:
For `\lambda \geq \frac{1}{n}||A^T b||_{\infty}`, the solution to `\min_x \frac{1}{2n} ||Ax - b||_2^2 + \lambda || x||_1` is `x=0`.
References¶
Refer to Section 3.1 and Proposition 4 in particular of [1] for more details.
[1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.