metric_learn.base_metric
._PairsClassifierMixin
- class metric_learn.base_metric._PairsClassifierMixin(preprocessor=None)[source]
Base class for pairs learners.
- Attributes:
- threshold_float
If the distance metric between two points is lower than this threshold, points will be classified as similar, otherwise they will be classified as dissimilar.
Methods
calibrate_threshold
(pairs_valid, y_valid[, ...])Decision threshold calibration for pairwise binary classification
decision_function
(pairs)Returns the decision function used to classify the pairs.
Get metadata routing of this object.
Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points.
get_params
([deep])Get parameters for this estimator.
pair_distance
(pairs)New in version 0.7.0: Compute the distance between pairs
pair_score
(pairs)New in version 0.7.0: Compute the similarity score between pairs
predict
(pairs)Predicts the learned metric between input pairs.
score
(pairs, y)Computes score of pairs similarity prediction.
score_pairs
(pairs)Returns the score between pairs (can be a similarity, or a distance/metric depending on the algorithm)
set_decision_function_request
(*[, pairs])Request metadata passed to the
decision_function
method.set_params
(**params)Set the parameters of this estimator.
set_predict_request
(*[, pairs])Request metadata passed to the
predict
method.set_score_request
(*[, pairs])Request metadata passed to the
score
method.set_threshold
(threshold)Sets the threshold of the metric learner to the given value threshold.
- __init__(preprocessor=None)
- calibrate_threshold(pairs_valid, y_valid, strategy='accuracy', min_rate=None, beta=1.0)[source]
Decision threshold calibration for pairwise binary classification
Method that calibrates the decision threshold (cutoff point) of the metric learner. This threshold will then be used when calling the method predict. The methods for picking cutoff points make use of traditional binary classification evaluation statistics such as the true positive and true negative rates and F-scores. The threshold will be found to maximize the chosen score on the validation set
(pairs_valid, y_valid)
.See more in the User Guide.
- Parameters:
- strategystr, optional (default=’accuracy’)
The strategy to use for choosing the cutoff threshold.
- ‘accuracy’
Selects a decision threshold that maximizes the accuracy.
- ‘f_beta’
Selects a decision threshold that maximizes the f_beta score, with beta given by the parameter beta.
- ‘max_tpr’
Selects a decision threshold that yields the highest true positive rate with true negative rate at least equal to the value of the parameter min_rate.
- ‘max_tnr’
Selects a decision threshold that yields the highest true negative rate with true positive rate at least equal to the value of the parameter min_rate.
- betafloat in [0, 1], optional (default=None)
Beta value to be used in case strategy == ‘f_beta’.
- min_ratefloat in [0, 1] or None, (default=None)
In case strategy is ‘max_tpr’ or ‘max_tnr’ this parameter must be set to specify the minimal value for the true negative rate or true positive rate respectively that needs to be achieved.
- pairs_validarray-like, shape=(n_pairs_valid, 2, n_features)
The validation set of pairs to use to set the threshold.
- y_validarray-like, shape=(n_pairs_valid,)
The labels of the pairs of the validation set to use to set the threshold. They must be +1 for positive pairs and -1 for negative pairs.
See also
sklearn.calibration
scikit-learn’s module for calibrating classifiers
References
[1]Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine, MH Zweig, G Campbell - Clinical chemistry, 1993
[2]Most of the code of this function is from scikit-learn’s PR #10117
- classes_ = array([0, 1])
- decision_function(pairs)[source]
Returns the decision function used to classify the pairs.
Returns the opposite of the learned metric value between samples in every pair, to be consistent with scikit-learn conventions. Hence it should ideally be low for dissimilar samples and high for similar samples. This is the decision function that is used to classify pairs as similar (+1), or dissimilar (-1).
- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to predict, with each row corresponding to two points, or 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- y_predictednumpy.ndarray of floats, shape=(n_constraints,)
The predicted decision function value for each pair.
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- abstract get_metric()
Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points. Depending on the algorithm, it can return a distance or a similarity function between pairs.
This function will be independent from the metric learner that learned it (it will not be modified if the initial metric learner is modified), and it can be directly plugged into the metric argument of scikit-learn’s estimators.
- Returns:
- metric_funfunction
The function described above.
See also
pair_distance
a method that returns the distance between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.
pair_score
a method that returns the similarity score between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.
Examples
>>> from metric_learn import NCA >>> from sklearn.datasets import make_classification >>> from sklearn.neighbors import KNeighborsClassifier >>> nca = NCA() >>> X, y = make_classification() >>> nca.fit(X, y) >>> knn = KNeighborsClassifier(metric=nca.get_metric()) >>> knn.fit(X, y) KNeighborsClassifier(algorithm='auto', leaf_size=30, metric=<function MahalanobisMixin.get_metric.<locals>.metric_fun at 0x...>, metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform')
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- abstract pair_distance(pairs)
New in version 0.7.0: Compute the distance between pairs
Returns the (pseudo) distance between pairs, when available. For metric learners that do not learn a (pseudo) distance, an error is thrown instead.
- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs for which to compute the distance, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- scoresnumpy.ndarray of shape=(n_pairs,)
The distance between every pair.
See also
get_metric
a method that returns a function to compute the metric between two points. The difference with pair_distance is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
- abstract pair_score(pairs)
New in version 0.7.0: Compute the similarity score between pairs
Returns the similarity score between pairs of points (the larger the score, the more similar the pair). For metric learners that learn a distance, the score is simply the opposite of the distance between pairs. All learners have access to this method.
- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- scoresnumpy.ndarray of shape=(n_pairs,)
The score of every pair.
See also
get_metric
a method that returns a function to compute the metric between two points. The difference with pair_score is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
- predict(pairs)[source]
Predicts the learned metric between input pairs. (For now it just calls decision function).
Returns the learned metric value between samples in every pair. It should ideally be low for similar samples and high for dissimilar samples.
- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to predict, with each row corresponding to two points, or 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- y_predictednumpy.ndarray of floats, shape=(n_constraints,)
The predicted learned metric value between samples in every pair.
- score(pairs, y)[source]
Computes score of pairs similarity prediction.
Returns the
roc_auc
score of the fitted metric learner. It is computed in the following way: for every value of a thresholdt
we classify all pairs of samples where the predicted distance is inferior tot
as belonging to the “similar” class, and the other as belonging to the “dissimilar” class, and we count false positive and true positives as in a classicalroc_auc
curve.- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs, with each row corresponding to two points, or 2D array of indices of pairs if the metric learner uses a preprocessor.
- yarray-like, shape=(n_constraints,)
The corresponding labels.
- Returns:
- scorefloat
The
roc_auc
score.
- abstract score_pairs(pairs)
Returns the score between pairs (can be a similarity, or a distance/metric depending on the algorithm)
Deprecated since version 0.7.0: Refer to pair_distance and pair_score.
Warning
This method will be removed in 0.8.0. Please refer to pair_distance or pair_score. This change will occur in order to add learners that don’t necessarily learn a Mahalanobis distance.
- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- scoresnumpy.ndarray of shape=(n_pairs,)
The score of every pair.
See also
get_metric
a method that returns a function to compute the metric between two points. The difference between score_pairs is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
- set_decision_function_request(*, pairs: bool | None | str = '$UNCHANGED$') _PairsClassifierMixin
Request metadata passed to the
decision_function
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed todecision_function
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it todecision_function
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- pairsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
pairs
parameter indecision_function
.
- Returns:
- selfobject
The updated object.
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_predict_request(*, pairs: bool | None | str = '$UNCHANGED$') _PairsClassifierMixin
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- pairsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
pairs
parameter inpredict
.
- Returns:
- selfobject
The updated object.
- set_score_request(*, pairs: bool | None | str = '$UNCHANGED$') _PairsClassifierMixin
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- pairsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
pairs
parameter inscore
.
- Returns:
- selfobject
The updated object.
- set_threshold(threshold)[source]
Sets the threshold of the metric learner to the given value threshold.
See more in the User Guide.
- Parameters:
- thresholdfloat
The threshold value we want to set. It is the value to which the predicted distance for test pairs will be compared. If they are superior to the threshold they will be classified as similar (+1), and dissimilar (-1) if not.
- Returns:
- self_PairsClassifier
The pairs classifier with the new threshold set.