How to Add a Custom Penalty#

skglm supports any arbitrary proximable penalty.

It is implemented as a jitclass which must inherit from the BasePenalty class:

class BasePenalty:
    """Base class for penalty subclasses."""

    def get_spec(self):
        """Specify the numba types of the class attributes.

        Returns
        -------
        spec: Tuple of (attribute_name, dtype)
            spec to be passed to Numba jitclass to compile the class.
        """

    def params_to_dict(self):
        """Get the parameters to initialize an instance of the class.

        Returns
        -------
        dict_of_params : dict
            The parameters to instantiate an object of the class.
        """

    def value(self, w):
        """Value of penalty at vector w."""

    def is_penalized(self, n_features):
        """Return a binary mask with the penalized features."""

    def generalized_support(self, w):
        """Return a mask which is True for coefficients in the generalized support."""

To implement your own penalty, you only need to define a new jitclass, inheriting from BasePenalty and implement the methods required by the targeted solver. Theses methods can be found in the solver documentation.

A case in point: defining L1 penalty#

We detail how the `\ell_1` penalty is implemented in skglm. For a vector `\beta \in \mathbb{R}^p`, the `\ell_1` penalty is defined as follows:

`|| \beta ||_1 = \sum_{i=1}^p |\beta _i| \ .`

The regularization level is controlled by the hyperparameter `\lambda \in bb(R)^+`, that is defined and initialized in the constructor of the class.

The method get_spec allows to strongly type the attributes of the penalty object, thus allowing Numba to JIT-compile the class. It should return an iterable of tuples, the first element being the name of the attribute, the second its Numba type (e.g. float64, bool_). Additionally, a penalty should implement params_to_dict, a helper method to get all the parameters of a penalty returned in a dictionary.

To optimize an objective with a given penalty, skglm needs at least the proximal operator of the penalty applied to the `j`-th coordinate. For the L1 penalty, it is the well-known soft-thresholding operator:

`"ST"(\beta , \lambda) = "max"(0, |\beta| - \lambda) "sgn"(\beta)\ .`

Note that skglm expects the threshold level to be the regularization hyperparameter `\lambda \in \mathbb{R}^+` scaled by the stepsize.

Besides, by default all solvers in skglm have ws_strategy turned on to subdiff. This means that the optimality conditions (thus the stopping criterion) is computed using the method subdiff_distance of the penalty. If not implemented, the user should set ws_strategy to fixpoint.

For the `\ell_1` penalty, the distance of the negative gradient of the datafit `F` to the subdifferential of the penalty reads

`"dist"(-\nabla_j F(\beta), \partial |\beta_j|) = {("max"(0, | -\nabla_j F(\beta) | - \lambda),), (| -\nabla_j F(\beta) - \lambda "sgn"(\beta_j) |,):} \ .`

The method is_penalized returns a binary mask with the penalized features. For the `\ell_1` penalty, all the coefficients are penalized. Finally, generalized_support returns the generalized support of the penalty for some coefficient vector w. It is typically the non-zero coefficients of the solution vector for `\ell_1`.

Optionally, a penalty might implement alpha_max which returns the smallest `\lambda` for which the optimal solution is a null vector. Note that since lambda is a reserved keyword in Python, alpha in skglm codebase corresponds to `\lambda`.

When putting all together, this gives the implementation of the L1 penalty:

class L1(BasePenalty):
    """:math:`ell_1` penalty."""

    def __init__(self, alpha, positive=False):
        self.alpha = alpha
        self.positive = positive

    def get_spec(self):
        spec = (
            ('alpha', float64),
            ('positive', bool_),
        )
        return spec

    def params_to_dict(self):
        return dict(alpha=self.alpha, positive=self.positive)

    def value(self, w):
        """Compute L1 penalty value."""
        return self.alpha * np.sum(np.abs(w))

    def prox_1d(self, value, stepsize, j):
        """Compute proximal operator of the L1 penalty (soft-thresholding operator)."""
        return ST(value, self.alpha * stepsize, self.positive)

    def subdiff_distance(self, w, grad, ws):
        """Compute distance of negative gradient to the subdifferential at w."""
        subdiff_dist = np.zeros_like(grad)
        for idx, j in enumerate(ws):
            if self.positive:
                if w[j] < 0:
                    subdiff_dist[idx] = np.inf
                elif w[j] == 0:
                    # distance of -grad_j to (-infty, alpha]
                    subdiff_dist[idx] = max(0, -grad[idx] - self.alpha)
                else:
                    # distance of -grad_j to {alpha}
                    subdiff_dist[idx] = np.abs(grad[idx] + self.alpha)
            else:
                if w[j] == 0:
                    # distance of -grad_j to  [-alpha, alpha]
                    subdiff_dist[idx] = max(0, np.abs(grad[idx]) - self.alpha)
                else:
                    # distance of -grad_j to {alpha * sign(w[j])}
                    subdiff_dist[idx] = np.abs(grad[idx] + np.sign(w[j]) * self.alpha)
        return subdiff_dist

    def is_penalized(self, n_features):
        """Return a binary mask with the penalized features."""
        return np.ones(n_features, dtype=np.bool_)

    def generalized_support(self, w):
        """Return a mask with non-zero coefficients."""
        return w != 0

    def alpha_max(self, gradient0):
        """Return penalization value for which 0 is solution."""
        return np.max(np.abs(gradient0))