User Guide#
Estimator#
The central piece of transformer, regressor, and classifier is
sklearn.base.BaseEstimator. All estimators in scikit-learn are derived
from this class. In more details, this base class enables to set and get
parameters of the estimator. It can be imported as:
>>> from sklearn.base import BaseEstimator
Once imported, you can create a class which inherate from this base class:
>>> class MyOwnEstimator(BaseEstimator):
... pass
Transformer#
Transformers are scikit-learn estimators which implement a transform method.
The use case is the following:
at
fit, some parameters can be learned fromXandy;at
transform,Xwill be transformed, using the parameters learned duringfit.
In addition, scikit-learn provides a
mixin, i.e. sklearn.base.TransformerMixin, which
implement the combination of fit and transform called fit_transform.
One can import the mixin class as:
>>> from sklearn.base import TransformerMixin
Therefore, when creating a transformer, you need to create a class which
inherits from both sklearn.base.BaseEstimator and
sklearn.base.TransformerMixin. The scikit-learn API imposed fit to
return ``self``. The reason is that it allows to pipeline fit and
transform imposed by the sklearn.base.TransformerMixin. The
fit method is expected to have X and y as inputs. Note that
transform takes only X as input and is expected to return the
transformed version of X:
>>> class MyOwnTransformer(TransformerMixin, BaseEstimator):
... def fit(self, X, y=None):
... return self
... def transform(self, X):
... return X
We build a basic example to show that our MyOwnTransformer is working
within a scikit-learn pipeline:
>>> from sklearn.datasets import load_iris
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = load_iris(return_X_y=True)
>>> pipe = make_pipeline(MyOwnTransformer(),
... LogisticRegression(random_state=10,
... solver='lbfgs'))
>>> pipe.fit(X, y)
Pipeline(...)
>>> pipe.predict(X)
array([...])
Predictor#
Regressor#
Similarly, regressors are scikit-learn estimators which implement a predict
method. The use case is the following:
at
fit, some parameters can be learned fromXandy;at
predict, predictions will be computed usingXusing the parameters learned duringfit.
In addition, scikit-learn provides a mixin, i.e.
sklearn.base.RegressorMixin, which implements the score method
which computes the \(R^2\) score of the predictions.
One can import the mixin as:
>>> from sklearn.base import RegressorMixin
Therefore, we create a regressor, MyOwnRegressor which inherits from
both sklearn.base.BaseEstimator and
sklearn.base.RegressorMixin. The method fit gets X and y
as input and should return self. It should implement the predict
function which should output the predictions of your regressor:
>>> import numpy as np
>>> class MyOwnRegressor(RegressorMixin, BaseEstimator):
... def fit(self, X, y):
... return self
... def predict(self, X):
... return np.mean(X, axis=1)
We illustrate that this regressor is working within a scikit-learn pipeline:
>>> from sklearn.datasets import load_diabetes
>>> X, y = load_diabetes(return_X_y=True)
>>> pipe = make_pipeline(MyOwnTransformer(), MyOwnRegressor())
>>> pipe.fit(X, y)
Pipeline(...)
>>> pipe.predict(X)
array([...])
Since we inherit from the sklearn.base.RegressorMixin, we can call
the score method which will return the \(R^2\) score:
>>> pipe.score(X, y)
-3.9...
Classifier#
Similarly to regressors, classifiers implement predict. In addition, they
output the probabilities of the prediction using the predict_proba method:
at
fit, some parameters can be learned fromXandy;at
predict, predictions will be computed usingXusing the parameters learned duringfit. The output corresponds to the predicted class for each sample;predict_probawill give a 2D matrix where each column corresponds to the class and each entry will be the probability of the associated class.
In addition, scikit-learn provides a mixin, i.e.
sklearn.base.ClassifierMixin, which implements the score method
which computes the accuracy score of the predictions.
One can import this mixin as:
>>> from sklearn.base import ClassifierMixin
Therefore, we create a classifier, MyOwnClassifier which inherits
from both slearn.base.BaseEstimator and
sklearn.base.ClassifierMixin. The method fit gets X and y
as input and should return self. It should implement the predict
function which should output the class inferred by the classifier.
predict_proba will output some probabilities instead:
>>> class MyOwnClassifier(ClassifierMixin, BaseEstimator):
... def fit(self, X, y):
... self.classes_ = np.unique(y)
... return self
... def predict(self, X):
... return np.random.randint(0, self.classes_.size,
... size=X.shape[0])
... def predict_proba(self, X):
... pred = np.random.rand(X.shape[0], self.classes_.size)
... return pred / np.sum(pred, axis=1)[:, np.newaxis]
We illustrate that this regressor is working within a scikit-learn pipeline:
>>> X, y = load_iris(return_X_y=True)
>>> pipe = make_pipeline(MyOwnTransformer(), MyOwnClassifier())
>>> pipe.fit(X, y)
Pipeline(...)
Then, you can call predict and predict_proba:
>>> pipe.predict(X)
array([...])
>>> pipe.predict_proba(X)
array([...])
Since our classifier inherits from sklearn.base.ClassifierMixin, we
can compute the accuracy by calling the score method:
>>> pipe.score(X, y)
0...