Utilities¶
Utility functions and resampling classes.
mapie.utils.train_conformalize_test_split
¶
train_conformalize_test_split(
X: NDArray,
y: NDArray,
train_size: Union[float, int],
conformalize_size: Union[float, int],
test_size: Union[float, int],
random_state: Optional[int] = None,
shuffle: bool = True,
) -> Tuple[
NDArray, NDArray, NDArray, NDArray, NDArray, NDArray
]
Split arrays or matrices into train, conformalization and test subsets.
Utility similar to sklearn.model_selection.train_test_split for splitting data into 3 sets.
We advise to give the major part of the data points to the train set and at least 200 data points to the conformalization set.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
TYPE:
|
y
|
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
TYPE:
|
train_size
|
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples.
TYPE:
|
conformalize_size
|
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the conformalize split. If int, represents the absolute number of conformalize samples.
TYPE:
|
test_size
|
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
TYPE:
|
random_state
|
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.
TYPE:
|
shuffle
|
Whether or not to shuffle the data before splitting.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
X_train, X_conformalize, X_test, y_train, y_conformalize, y_test :
|
6 array-like splits of inputs. output types are the same as the input types. |
Examples:
>>> import numpy as np
>>> from sklearn.datasets import make_regression
>>> from mapie.utils import train_conformalize_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]
>>> (
... X_train, X_conformalize, X_test,
... y_train, y_conformalize, y_test
... ) = train_conformalize_test_split(
... X, y, train_size=0.6, conformalize_size=0.2, test_size=0.2, random_state=1
... )
>>> X_train
array([[8, 9],
[0, 1],
[6, 7]])
>>> X_conformalize
array([[2, 3]])
>>> X_test
array([[4, 5]])
>>> y_train
[4, 0, 3]
>>> y_conformalize
[1]
>>> y_test
[2]
Source code in mapie/utils.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
mapie.subsample.Subsample
¶
Subsample(
n_resamplings: int = 30,
n_samples: Optional[Union[int, float]] = None,
replace: bool = True,
random_state: Optional[Union[int, RandomState]] = None,
)
Bases: BaseCrossValidator
Generate a sampling method, that resamples the training set with
possible bootstraps. It can be used as cv argument in
JackknifeAfterBootstrapRegressor.
| PARAMETER | DESCRIPTION |
|---|---|
n_resamplings
|
Number of resamplings. By default
TYPE:
|
n_samples
|
Number of samples in each resampling. By default
TYPE:
|
replace
|
Whether to replace samples in resamplings or not. By default
TYPE:
|
random_state
|
int or RandomState instance. By default
TYPE:
|
Examples:
>>> import numpy as np
>>> from mapie.subsample import Subsample
>>> cv = Subsample(n_resamplings=2,random_state=0)
>>> X = np.array([1,2,3,4,5,6,7,8,9,10])
>>> for train_index, test_index in cv.split(X):
... print(f"train index is {train_index}, test index is {test_index}")
train index is [5 0 3 3 7 9 3 5 2 4], test index is [1 6 8]
train index is [7 6 8 8 1 6 7 7 8 1], test index is [0 2 3 4 5 9]
Source code in mapie/subsample.py
split
¶
Generate indices to split data into training and test sets.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Training data.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
train
|
The training set indices for that split.
TYPE::
|
test
|
The testing set indices for that split.
TYPE::
|
Source code in mapie/subsample.py
get_n_splits
¶
Returns the number of splitting iterations in the cross-validator.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Returns the number of splitting iterations in the cross-validator. |
Source code in mapie/subsample.py
mapie.subsample.BlockBootstrap
¶
BlockBootstrap(
n_resamplings: int = 30,
length: Optional[int] = None,
n_blocks: Optional[int] = None,
overlapping: bool = False,
random_state: Optional[Union[int, RandomState]] = None,
)
Bases: BaseCrossValidator
Generate a sampling method, that block bootstraps the training set. It can replace KFold, LeaveOneOut or SubSample as cv argument in the TimeSeriesRegressor class.
| PARAMETER | DESCRIPTION |
|---|---|
n_resamplings
|
Number of resamplings. By default
TYPE:
|
length
|
Length of the blocks. By default
TYPE:
|
overlapping
|
Whether the blocks can overlap or not. By default
TYPE:
|
n_blocks
|
Number of blocks in each resampling. By default
TYPE:
|
random_state
|
int or RandomState instance.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If both |
Examples:
>>> import numpy as np
>>> from mapie.subsample import BlockBootstrap
>>> cv = BlockBootstrap(n_resamplings=2, length=3, random_state=0)
>>> X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> for train_index, test_index in cv.split(X):
... print(f"train index is {train_index}, test index is {test_index}")
train index is [0 1 2 3 4 5 0 1 2 3 4 5], test index is [8 9 6 7]
train index is [3 4 5 6 7 8 0 1 2 6 7 8], test index is [9]
Source code in mapie/subsample.py
split
¶
Generate indices to split data into training and test sets.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Training data.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
train
|
The training set indices for that split.
TYPE::
|
test
|
The testing set indices for that split.
TYPE::
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Source code in mapie/subsample.py
get_n_splits
¶
Returns the number of splitting iterations in the cross-validator.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Returns the number of splitting iterations in the cross-validator. |