Details on the Positive Group Lasso¶
This tutorial presents how to derive the proximity operator and subdifferential of the `l_2`-penalty, and the `l_2`-penalty with nonnegative constraints.
Proximity operator of the group Lasso¶
Let
then its Fenchel-Legendre conjugate is
and for all `x \in \mathbb{R}^p`
Using the Moreau decomposition, Equations (1) and (2), one has
A similar formula can be derived for the group Lasso with nonnegative constraints.
Proximity operator of the group Lasso with positivity constraints¶
Let
Let `x \in \mathbb{R}^p` and `S = \{ j \in 1, ..., p | x_j > 0 \} \in \mathbb{R}^p`, then
and
As before, using the Moreau decomposition and Equation (3) yields
and thus, combined with Equations (4) and (5) it leads to
Subdifferential of the positive Group Lasso penalty¶
For the subdiff_diff
working set strategy, we compute the distance `D(v)` for some `v` to the subdifferential of the `h` penalty at a point `w`.
Since the penalty is group-separable, we reduce the case where `w` is a block of variables in `\mathbb{R}^g`.
Case `w \notin \mathbb{R}_+^g`¶
If any component of `w` is strictly negative, the subdifferential is empty, and the distance is `+ \infty`.
Case `w = 0`¶
At `w = 0`, the subdifferential is:
where `\mathcal{B}_2` is the unit ball.
Therefore, the distance to the subdifferential writes
Minimizing over `n` then over `u`, thanks to [1], yields
where `v^+` is `v` restricted to its positive coordinates. Intuitively, it is clear that if `v_i < 0`, we can cancel it exactly in the objective function by taking `n_i = - v_i` and `u_i = 0`; on the other hand, if `v_i>0`, taking a non zero `n_i` will only increase the quantity that `u_i` needs to bring closer to 0.
For a rigorous derivation of this, introduce the Lagrangian on a squared objective
and write down the optimality condition with respect to `u` and `n`. Treat the case `nu = 0` separately; in the other case show that :math:u must be positive, and that `v = (1 + \nu) u + n`, together with `u = \mu / \nu` and complementary slackness, to reach the conclusion.
Case `|| w || \ne 0`¶
The subdifferential in that case is `\lambda w / {|| w ||} + C_1 \times \ldots \times C_g` where `C_j = {0}` if `w_j > 0` and `C_j = mathbb{R}_-` otherwise (`w_j =0`).
By letting `p` denotes the projection of `v` onto this set, one has
and
The distance to the subdifferential is then:
since `v_j - \min(v_j, 0) = v_j + \max(-v_j, 0) = \max(0, v_j)`.