Details on the Positive Group Lasso#
This tutorial presents how to derive the proximity operator and subdifferential of the `l_2`-penalty, and the `l_2`-penalty with nonnegative constraints.
Proximity operator of the group Lasso#
Let
then its Fenchel-Legendre conjugate is
and for all `x \in \mathbb{R}^p`
Using the Moreau decomposition, Equations (1) and (2), one has
A similar formula can be derived for the group Lasso with nonnegative constraints.
Proximity operator of the group Lasso with positivity constraints#
Let
Let `x \in \mathbb{R}^p` and `S = \{ j \in 1, ..., p | x_j > 0 \} \in \mathbb{R}^p`, then
and
As before, using the Moreau decomposition and Equation (3) yields
and thus, combined with Equations (4) and (5) it leads to
Subdifferential of the positive Group Lasso penalty#
For the subdiff_diff
working set strategy, we compute the distance `D(v)` for some `v` to the subdifferential of the `h` penalty at a point `w`.
Since the penalty is group-separable, we reduce the case where `w` is a block of variables in `\mathbb{R}^g`.
Case `w \notin \mathbb{R}_+^g`#
If any component of `w` is strictly negative, the subdifferential is empty, and the distance is `+ \infty`.
Case `w = 0`#
At `w = 0`, the subdifferential is:
where `\mathcal{B}_2` is the unit ball.
Therefore, the distance to the subdifferential writes
Minimizing over `n` then over `u`, thanks to [1], yields
where `v^+` is `v` restricted to its positive coordinates. Intuitively, it is clear that if `v_i < 0`, we can cancel it exactly in the objective function by taking `n_i = - v_i` and `u_i = 0`; on the other hand, if `v_i>0`, taking a non zero `n_i` will only increase the quantity that `u_i` needs to bring closer to 0.
For a rigorous derivation of this, introduce the Lagrangian on a squared objective
and write down the optimality condition with respect to `u` and `n`. Treat the case `nu = 0` separately; in the other case show that :math:u must be positive, and that `v = (1 + \nu) u + n`, together with `u = \mu / \nu` and complementary slackness, to reach the conclusion.
Case `|| w || \ne 0`#
The subdifferential in that case is `\lambda w / {|| w ||} + C_1 \times \ldots \times C_g` where `C_j = {0}` if `w_j > 0` and `C_j = mathbb{R}_-` otherwise (`w_j =0`).
By letting `p` denotes the projection of `v` onto this set, one has
and
The distance to the subdifferential is then:
since `v_j - \min(v_j, 0) = v_j + \max(-v_j, 0) = \max(0, v_j)`.