How to add a custom datafit¶
Motivated by generalized linear models but not limited to it, skglm
solves problems of the form
Here, `X \in \mathbb{R}^{n \times p}` denotes the design matrix with `n` samples and `p` features, and `\beta \in \mathbb{R}^p` is the coefficient vector.
skglm can solve any problems of this form with arbitrary smooth datafit `F` and arbitrary penalty `\Omega` whose proximal operator can be evaluated explicitly, by defining two classes: a Penalty
and a Datafit
.
They can then be passed to a GeneralizedLinearEstimator
.
clf = GeneralizedLinearEstimator(
MyDatafit(),
MyPenalty(),
)
A Datafit
is a jitclass that must inherit from the BaseDatafit
class:
class BaseDatafit:
"""Base class for datafits."""
def get_spec(self):
"""Specify the numba types of the class attributes.
Returns
-------
spec: Tuple of (attribute_name, dtype)
spec to be passed to Numba jitclass to compile the class.
"""
def params_to_dict(self):
"""Get the parameters to initialize an instance of the class.
Returns
-------
dict_of_params : dict
The parameters to instantiate an object of the class.
"""
def initialize(self, X, y):
"""Pre-computations before fitting on X and y.
Parameters
----------
X : array, shape (n_samples, n_features)
Design matrix.
y : array, shape (n_samples,)
Target vector.
"""
def initialize_sparse(self, X_data, X_indptr, X_indices, y):
"""Pre-computations before fitting on X and y when X is a sparse matrix.
Parameters
----------
X_data : array, shape (n_elements,)
`data` attribute of the sparse CSC matrix X.
X_indptr : array, shape (n_features + 1,)
`indptr` attribute of the sparse CSC matrix X.
X_indices : array, shape (n_elements,)
`indices` attribute of the sparse CSC matrix X.
y : array, shape (n_samples,)
Target vector.
"""
def value(self, y, w, Xw):
"""Value of datafit at vector w.
Parameters
----------
y : array_like, shape (n_samples,)
Target vector.
w : array_like, shape (n_features,)
Coefficient vector.
Xw: array_like, shape (n_samples,)
Model fit.
Returns
-------
value : float
The datafit value at vector w.
"""
To define a custom datafit, you need to inherit from BaseDatafit
class and implement methods required by the targeted solver.
These methods can be found in the solver documentation.
Optionally, overloading the methods with the suffix _sparse
adds support for sparse datasets (CSC matrix).
This tutorial shows how to implement Poisson datafit to be fitted with ProxNewton solver.
A case in point: defining Poisson datafit¶
First, this requires deriving some quantities used by the solvers like the gradient or the Hessian matrix of the datafit. With `y \in \mathbb{R}^n` the target vector, the Poisson datafit reads
Let’s define some useful quantities to simplify our computations. For `z \in \mathbb{R}^n` and `\beta \in \mathbb{R}^p`,
Computing the gradient of `F` and its Hessian matrix yields
Besides, it directly follows that
We can now apply these definitions to the Poisson datafit:
Therefore,
Computing raw_grad
and raw_hessian
for the Poisson datafit yields
Both raw_grad
and raw_hessian
are methods used by the ProxNewton
solver.
But other optimizers require different methods to be implemented. For instance, AndersonCD
uses the gradient_scalar
method:
it is the derivative of the datafit with respect to the `j`-th coordinate of `\beta`.
For the Poisson datafit, this yields
When implementing these quantities in the Poisson
datafit class, this gives:
class Poisson(BaseDatafit):
r"""Poisson datafit.
The datafit reads:
.. math:: 1 / n_"samples" sum_(i=1)^(n_"samples") (exp((Xw)_i) - y_i (Xw)_i)
Notes
-----
The class is jit compiled at fit time using Numba compiler.
This allows for faster computations.
"""
def __init__(self):
pass
def get_spec(self):
pass
def params_to_dict(self):
return dict()
def initialize(self, X, y):
if np.any(y < 0):
raise ValueError(
"Target vector `y` should only take positive values "
"when fitting a Poisson model.")
def initialize_sparse(self, X_data, X_indptr, X_indices, y):
if np.any(y < 0):
raise ValueError(
"Target vector `y` should only take positive values "
"when fitting a Poisson model.")
def raw_grad(self, y, Xw):
"""Compute gradient of datafit w.r.t ``Xw``."""
return (np.exp(Xw) - y) / len(y)
def raw_hessian(self, y, Xw):
"""Compute Hessian of datafit w.r.t ``Xw``."""
return np.exp(Xw) / len(y)
def value(self, y, w, Xw):
return np.sum(np.exp(Xw) - y * Xw) / len(y)
def gradient(self, X, y, Xw):
return X.T @ self.raw_grad(y, Xw)
def gradient_scalar(self, X, y, w, Xw, j):
return (X[:, j] @ (np.exp(Xw) - y)) / len(y)
def full_grad_sparse(self, X_data, X_indptr, X_indices, y, Xw):
n_features = X_indptr.shape[0] - 1
grad = np.zeros(n_features, dtype=X_data.dtype)
for j in range(n_features):
grad[j] = 0.
for i in range(X_indptr[j], X_indptr[j + 1]):
grad[j] += X_data[i] * (
np.exp(Xw[X_indices[i]] - y[X_indices[i]])) / len(y)
return grad
def gradient_scalar_sparse(self, X_data, X_indptr, X_indices, y, Xw, j):
grad = 0.
for i in range(X_indptr[j], X_indptr[j + 1]):
idx_i = X_indices[i]
grad += X_data[i] * (np.exp(Xw[idx_i]) - y[idx_i])
return grad / len(y)
def intercept_update_step(self, y, Xw):
return np.sum(self.raw_grad(y, Xw))
Note that we have not initialized any quantities in the initialize
method.
Usually, it serves to compute datafit attributes specific to a dataset X, y
for computational efficiency, for example the computation of X.T @ y
in Quadratic datafit.