How to add a custom penalty¶
skglm supports any arbitrary proximable penalty.
It is implemented as a jitclass which must inherit from the BasePenalty
class:
class BasePenalty:
"""Base class for penalty subclasses."""
def get_spec(self):
"""Specify the numba types of the class attributes.
Returns
-------
spec: Tuple of (attribute_name, dtype)
spec to be passed to Numba jitclass to compile the class.
"""
def params_to_dict(self):
"""Get the parameters to initialize an instance of the class.
Returns
-------
dict_of_params : dict
The parameters to instantiate an object of the class.
"""
def value(self, w):
"""Value of penalty at vector w."""
def is_penalized(self, n_features):
"""Return a binary mask with the penalized features."""
def generalized_support(self, w):
"""Return a mask which is True for coefficients in the generalized support."""
To implement your own penalty, you only need to define a new jitclass, inheriting from BasePenalty
and implement the methods required by the targeted solver.
Theses methods can be found in the solver documentation.
A case in point: defining L1 penalty¶
We detail how the `\ell_1` penalty is implemented in skglm. For a vector `\beta \in \mathbb{R}^p`, the `\ell_1` penalty is defined as follows:
The regularization level is controlled by the hyperparameter `\lambda \in bb(R)^+`, that is defined and initialized in the constructor of the class.
The method get_spec
allows to strongly type the attributes of the penalty object, thus allowing Numba to JIT-compile the class.
It should return an iterable of tuples, the first element being the name of the attribute, the second its Numba type (e.g. float64
, bool_
).
Additionally, a penalty should implement params_to_dict
, a helper method to get all the parameters of a penalty returned in a dictionary.
To optimize an objective with a given penalty, skglm needs at least the proximal operator of the penalty applied to the `j`-th coordinate.
For the L1
penalty, it is the well-known soft-thresholding operator:
Note that skglm expects the threshold level to be the regularization hyperparameter `\lambda \in \mathbb{R}^+` scaled by the stepsize.
Besides, by default all solvers in skglm have ws_strategy
turned on to subdiff
.
This means that the optimality conditions (thus the stopping criterion) is computed using the method subdiff_distance
of the penalty.
If not implemented, the user should set ws_strategy
to fixpoint
.
For the `\ell_1` penalty, the distance of the negative gradient of the datafit `F` to the subdifferential of the penalty reads
The method is_penalized
returns a binary mask with the penalized features.
For the `\ell_1` penalty, all the coefficients are penalized.
Finally, generalized_support
returns the generalized support of the penalty for some coefficient vector w
.
It is typically the non-zero coefficients of the solution vector for `\ell_1`.
Optionally, a penalty might implement alpha_max
which returns the smallest `\lambda` for which the optimal solution is a null vector.
Note that since lambda
is a reserved keyword in Python, alpha
in skglm codebase corresponds to `\lambda`.
When putting all together, this gives the implementation of the L1
penalty:
class L1(BasePenalty):
""":math:`ell_1` penalty."""
def __init__(self, alpha, positive=False):
self.alpha = alpha
self.positive = positive
def get_spec(self):
spec = (
('alpha', float64),
('positive', bool_),
)
return spec
def params_to_dict(self):
return dict(alpha=self.alpha, positive=self.positive)
def value(self, w):
"""Compute L1 penalty value."""
return self.alpha * np.sum(np.abs(w))
def prox_1d(self, value, stepsize, j):
"""Compute proximal operator of the L1 penalty (soft-thresholding operator)."""
return ST(value, self.alpha * stepsize, self.positive)
def subdiff_distance(self, w, grad, ws):
"""Compute distance of negative gradient to the subdifferential at w."""
subdiff_dist = np.zeros_like(grad)
for idx, j in enumerate(ws):
if self.positive:
if w[j] < 0:
subdiff_dist[idx] = np.inf
elif w[j] == 0:
# distance of -grad_j to (-infty, alpha]
subdiff_dist[idx] = max(0, -grad[idx] - self.alpha)
else:
# distance of -grad_j to {alpha}
subdiff_dist[idx] = np.abs(grad[idx] + self.alpha)
else:
if w[j] == 0:
# distance of -grad_j to [-alpha, alpha]
subdiff_dist[idx] = max(0, np.abs(grad[idx]) - self.alpha)
else:
# distance of -grad_j to {alpha * sign(w[j])}
subdiff_dist[idx] = np.abs(grad[idx] + np.sign(w[j]) * self.alpha)
return subdiff_dist
def is_penalized(self, n_features):
"""Return a binary mask with the penalized features."""
return np.ones(n_features, dtype=np.bool_)
def generalized_support(self, w):
"""Return a mask with non-zero coefficients."""
return w != 0
def alpha_max(self, gradient0):
"""Return penalization value for which 0 is solution."""
return np.max(np.abs(gradient0))