Exchangeability testing on a fixed dataset¶

This quickstart demonstrates how to test exchangeability on a fixed dataset.

Guarantees provided by conformal prediction and risk control depend on the hypothesis that data is exchangeable. Verifying exchangeability before applying methods from MAPIE is therefore important. Typically, (split) conformal prediction, risk control, and calibration require data not seen during training, such as a split of the test data.

Note that for the exchangeability test to be valid, the order of samples in the fixed dataset must be representative of what will happen after deployment. Shuffling the data beforehand would trivially render the dataset exchangeable and hide any potential distribution shift.

Prepare the data¶

We first prepare the data and fit a classifier on the training data.

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from mapie._example_utils import generate_gaussian_stream, plot_dataset
from mapie.classification import SplitConformalClassifier
from mapie.exchangeability_testing import FixedDatasetExchangeabilityTest

random_state = 42

X, y = generate_gaussian_stream(
    shift_type="stable",
    random_state=random_state,
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=random_state, shuffle=False
)

plot_dataset(
    X_test,
    y_test,
    title="Exchangeable fixed dataset",
)

classifier = LogisticRegression(random_state=random_state)
classifier.fit(X_train, y_train)

Exchangeable fixed dataset

LogisticRegression(random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Run the exchangeability test¶

Now we can test the exchangeability of the test dataset. By default, we use all available test methods. Method-specific parameters can be passed as a dictionary.

exchangeability_test = FixedDatasetExchangeabilityTest()
exchangeability_test.run(X_test, y_test)

print("Is the test dataset exchangeable?")
for test_name, is_exchangeable in exchangeability_test.is_exchangeable.items():
    print(f"{test_name}: {is_exchangeable}")

Out:

Is the test dataset exchangeable?
pvalue_permutation: True
permutation_binomial: True
permutation_binomial_mixture: True
permutation_aggressive: True
plugin_martingale: True
jumper_martingale: True

Continue with MAPIE¶

The test dataset is exchangeable. We can continue with MAPIE. Conformalization with a split of the test dataset will provide coverage guarantees on the remaining test data.

X_conformalize, X_test_new, y_conformalize, y_test_new = train_test_split(
    X_test, y_test, test_size=0.5, random_state=random_state
)

confidence_level = 0.95
mapie_classifier = SplitConformalClassifier(
    estimator=classifier, confidence_level=confidence_level, prefit=True
)
mapie_classifier.conformalize(X_conformalize, y_conformalize)
y_pred, y_pred_set = mapie_classifier.predict_set(X_test_new)

Create a non-exchangeable dataset¶

Now let us see what happens for a non-exchangeable fixed dataset. Here, an abrupt shift happens in the second part of the dataset.

X_test_abrupt, y_test_abrupt = generate_gaussian_stream(
    n_samples=len(X_test),
    shift_type="abrupt",
    prop_shift=0.5,
    random_state=random_state + 1,
)
shift_start_abrupt = int(len(y_test_abrupt) * 0.5)
plot_dataset(
    X_test_abrupt,
    y_test_abrupt,
    title="Non-exchangeable fixed dataset",
    shift_start=shift_start_abrupt,
)

exchangeability_test = FixedDatasetExchangeabilityTest()
exchangeability_test.run(X_test_abrupt, y_test_abrupt)

print("Is the shifted dataset exchangeable?")
for test_name, is_exchangeable in exchangeability_test.is_exchangeable.items():
    print(f"{test_name}: {is_exchangeable}")

Non-exchangeable fixed dataset

Out:

Is the shifted dataset exchangeable?
pvalue_permutation: False
permutation_binomial: False
permutation_binomial_mixture: False
permutation_aggressive: False
plugin_martingale: False
jumper_martingale: True

Interpret the result¶

The shifted test dataset is not exchangeable: MAPIE cannot provide statistical guarantees on it, and more generally the classifier itself should not be trusted without further investigation.

Note that the jumper martingale fails to detect the non-exchangeability in this case. Itmostly reacts to one-sided p-value distortions (many consistently small p-values, or many consistently large ones). The shift creates both many very low and many very high p-values, the effects cancel out for the jumper martingale. More generally, this illustrates that no single test is perfect, and that it is important to use multiple tests to get a complete picture.

Total running time of the script: ( 0 minutes 8.169 seconds)

Download Python source code: plot_exchangeability_fixed_dataset.py

Download Jupyter notebook: plot_exchangeability_fixed_dataset.ipynb

Gallery generated by mkdocs-gallery

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	42
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	100
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None