metric_learn
.ITML¶

class
metric_learn.
ITML
(gamma=1.0, max_iter=1000, convergence_threshold=0.001, prior='identity', verbose=False, preprocessor=None, random_state=None)[source]¶ Information Theoretic Metric Learning (ITML)
ITML minimizes the (differential) relative entropy, aka KullbackLeibler divergence, between two multivariate Gaussians subject to constraints on the associated Mahalanobis distance, which can be formulated into a Bregman optimization problem by minimizing the LogDet divergence subject to linear constraints. This algorithm can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Unlike some other methods, ITML does not rely on an eigenvalue computation or semidefinite programming.
Read more in the User Guide.
Parameters:  gammafloat, optional (default=1.0)
Value for slack variables
 max_iterint, optional (default=1000)
Maximum number of iteration of the optimization procedure.
 convergence_thresholdfloat, optional (default=1e3)
Convergence tolerance.
 priorstring or numpy array, optional (default=’identity’)
The Mahalanobis matrix to use as a prior. Possible options are ‘identity’, ‘covariance’, ‘random’, and a numpy array of shape (n_features, n_features). For ITML, the prior should be strictly positive definite (PD).
 ‘identity’
An identity matrix of shape (n_features, n_features).
 ‘covariance’
The inverse covariance matrix.
 ‘random’
The prior will be a random SPD matrix of shape
(n_features, n_features)
, generated usingsklearn.datasets.make_spd_matrix
. numpy array
A positive definite (PD) matrix of shape (n_features, n_features), that will be used as such to set the prior.
 verbosebool, optional (default=False)
If True, prints information while learning
 preprocessorarraylike, shape=(n_samples, n_features) or callable
The preprocessor to call to get tuples from indices. If arraylike, tuples will be formed like this: X[indices].
 random_stateint or numpy.RandomState or None, optional (default=None)
A pseudo random number generator object or a seed for it if int. If
prior='random'
,random_state
is used to set the prior.
References
[1] Jason V. Davis, et al. Informationtheoretic Metric Learning. ICML 2007. Examples
>>> from metric_learn import ITML >>> pairs = [[[1.2, 7.5], [1.3, 1.5]], >>> [[6.4, 2.6], [6.2, 9.7]], >>> [[1.3, 4.5], [3.2, 4.6]], >>> [[6.2, 5.5], [5.4, 5.4]]] >>> y = [1, 1, 1, 1] >>> # in this task we want points where the first feature is close to be >>> # closer to each other, no matter how close the second feature is >>> itml = ITML() >>> itml.fit(pairs, y)
Attributes:  bounds_
numpy.ndarray
, shape=(2,) Bounds on similarity, aside slack variables, s.t.
d(a, b) < bounds_[0]
for all given pairs of similar pointsa
andb
, andd(c, d) > bounds_[1]
for all given pairs of dissimilar pointsc
andd
, withd
the learned distance. If not provided at initialization, bounds_[0] and bounds_[1] are set at train time to the 5th and 95th percentile of the pairwise distances among all points present in the inputpairs
. n_iter_
int
The number of iterations the solver has run.
 components_
numpy.ndarray
, shape=(n_features, n_features) The linear transformation
L
deduced from the learned Mahalanobis metric (See functioncomponents_from_metric
.) threshold_
float
If the distance metric between two points is lower than this threshold, points will be classified as similar, otherwise they will be classified as dissimilar.
Methods
calibrate_threshold
(pairs_valid, y_valid[, …])Decision threshold calibration for pairwise binary classification decision_function
(pairs)Returns the decision function used to classify the pairs. fit
(pairs, y[, bounds, calibration_params])Learn the ITML model. get_mahalanobis_matrix
()Returns a copy of the Mahalanobis matrix learned by the metric learner. get_metric
()Returns a function that takes as input two 1D arrays and outputs the learned metric score on these two points. get_params
([deep])Get parameters for this estimator. predict
(pairs)Predicts the learned metric between input pairs. score
(pairs, y)Computes score of pairs similarity prediction. score_pairs
(pairs)Returns the learned Mahalanobis distance between pairs. set_params
(**params)Set the parameters of this estimator. set_threshold
(threshold)Sets the threshold of the metric learner to the given value threshold
.transform
(X)Embeds data points in the learned linear embedding space. 
__init__
(gamma=1.0, max_iter=1000, convergence_threshold=0.001, prior='identity', verbose=False, preprocessor=None, random_state=None)¶ Initialize self. See help(type(self)) for accurate signature.

calibrate_threshold
(pairs_valid, y_valid, strategy='accuracy', min_rate=None, beta=1.0)¶ Decision threshold calibration for pairwise binary classification
Method that calibrates the decision threshold (cutoff point) of the metric learner. This threshold will then be used when calling the method
predict
. The methods for picking cutoff points make use of traditional binary classification evaluation statistics such as the true positive and true negative rates and Fscores. The threshold will be found to maximize the chosen score on the validation set(pairs_valid, y_valid)
.See more in the User Guide.
Parameters:  strategystr, optional (default=’accuracy’)
The strategy to use for choosing the cutoff threshold.
 ‘accuracy’
Selects a decision threshold that maximizes the accuracy.
 ‘f_beta’
Selects a decision threshold that maximizes the f_beta score, with beta given by the parameter
beta
. ‘max_tpr’
Selects a decision threshold that yields the highest true positive rate with true negative rate at least equal to the value of the parameter
min_rate
. ‘max_tnr’
Selects a decision threshold that yields the highest true negative rate with true positive rate at least equal to the value of the parameter
min_rate
.
 betafloat in [0, 1], optional (default=None)
Beta value to be used in case strategy == ‘f_beta’.
 min_ratefloat in [0, 1] or None, (default=None)
In case strategy is ‘max_tpr’ or ‘max_tnr’ this parameter must be set to specify the minimal value for the true negative rate or true positive rate respectively that needs to be achieved.
 pairs_validarraylike, shape=(n_pairs_valid, 2, n_features)
The validation set of pairs to use to set the threshold.
 y_validarraylike, shape=(n_pairs_valid,)
The labels of the pairs of the validation set to use to set the threshold. They must be +1 for positive pairs and 1 for negative pairs.
See also
sklearn.calibration
 scikitlearn’s module for calibrating classifiers
References
[1] Receiveroperating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine, MH Zweig, G Campbell  Clinical chemistry, 1993 [2] most of the code of this function is from scikitlearn’s PR #10117

decision_function
(pairs)¶ Returns the decision function used to classify the pairs.
Returns the opposite of the learned metric value between samples in every pair, to be consistent with scikitlearn conventions. Hence it should ideally be low for dissimilar samples and high for similar samples. This is the decision function that is used to classify pairs as similar (+1), or dissimilar (1).
Parameters:  pairsarraylike, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to predict, with each row corresponding to two points, or 2D array of indices of pairs if the metric learner uses a preprocessor.
Returns:  y_predicted
numpy.ndarray
of floats, shape=(n_constraints,) The predicted decision function value for each pair.

fit
(pairs, y, bounds=None, calibration_params=None)[source]¶ Learn the ITML model.
The threshold will be calibrated on the trainset using the parameters
calibration_params
.Parameters:  pairs: arraylike, shape=(n_constraints, 2, n_features) or (n_constraints, 2)
3D Array of pairs with each row corresponding to two points, or 2D array of indices of pairs if the metric learner uses a preprocessor.
 y: arraylike, of shape (n_constraints,)
Labels of constraints. Should be 1 for dissimilar pair, 1 for similar.
 boundsarraylike of two numbers
Bounds on similarity, aside slack variables, s.t.
d(a, b) < bounds_[0]
for all given pairs of similar pointsa
andb
, andd(c, d) > bounds_[1]
for all given pairs of dissimilar pointsc
andd
, withd
the learned distance. If not provided at initialization, bounds_[0] and bounds_[1] will be set to the 5th and 95th percentile of the pairwise distances among all points present in the inputpairs
. calibration_params
dict
orNone
Dictionary of parameters to give to
calibrate_threshold
for the threshold calibration step done at the end offit
. IfNone
is given,calibrate_threshold
will use the default parameters.
Returns:  selfobject
Returns the instance.

get_mahalanobis_matrix
()¶ Returns a copy of the Mahalanobis matrix learned by the metric learner.
Returns:  M
numpy.ndarray
, shape=(n_features, n_features) The copy of the learned Mahalanobis matrix.
 M

get_metric
()¶ Returns a function that takes as input two 1D arrays and outputs the learned metric score on these two points.
This function will be independent from the metric learner that learned it (it will not be modified if the initial metric learner is modified), and it can be directly plugged into the
metric
argument of scikitlearn’s estimators.Returns:  metric_funfunction
The function described above.
See also
score_pairs
 a method that returns the metric score between several pairs of points. Unlike
get_metric
, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.
Examples
>>> from metric_learn import NCA >>> from sklearn.datasets import make_classification >>> from sklearn.neighbors import KNeighborsClassifier >>> nca = NCA() >>> X, y = make_classification() >>> nca.fit(X, y) >>> knn = KNeighborsClassifier(metric=nca.get_metric()) >>> knn.fit(X, y) KNeighborsClassifier(algorithm='auto', leaf_size=30, metric=<function MahalanobisMixin.get_metric.<locals>.metric_fun at 0x...>, metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform')

get_params
(deep=True)¶ Get parameters for this estimator.
Parameters:  deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  paramsmapping of string to any
Parameter names mapped to their values.

predict
(pairs)¶ Predicts the learned metric between input pairs. (For now it just calls decision function).
Returns the learned metric value between samples in every pair. It should ideally be low for similar samples and high for dissimilar samples.
Parameters:  pairsarraylike, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to predict, with each row corresponding to two points, or 2D array of indices of pairs if the metric learner uses a preprocessor.
Returns:  y_predicted
numpy.ndarray
of floats, shape=(n_constraints,) The predicted learned metric value between samples in every pair.

score
(pairs, y)¶ Computes score of pairs similarity prediction.
Returns the
roc_auc
score of the fitted metric learner. It is computed in the following way: for every value of a thresholdt
we classify all pairs of samples where the predicted distance is inferior tot
as belonging to the “similar” class, and the other as belonging to the “dissimilar” class, and we count false positive and true positives as in a classicalroc_auc
curve.Parameters:  pairsarraylike, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs, with each row corresponding to two points, or 2D array of indices of pairs if the metric learner uses a preprocessor.
 yarraylike, shape=(n_constraints,)
The corresponding labels.
Returns:  scorefloat
The
roc_auc
score.

score_pairs
(pairs)¶ Returns the learned Mahalanobis distance between pairs.
This distance is defined as: \(d_M(x, x') = \sqrt{(xx')^T M (xx')}\) where
M
is the learned Mahalanobis matrix, for every pair of pointsx
andx'
. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: \(d_M(x, x') = \sqrt{(x_e  x_e')^T (x_e x_e')}\), with \(x_e = L x\) (SeeMahalanobisMixin
).Parameters:  pairsarraylike, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
Returns:  scores
numpy.ndarray
of shape=(n_pairs,) The learned Mahalanobis distance for every pair.
See also
get_metric
 a method that returns a function to compute the metric between two points. The difference with
score_pairs
is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.  Mahalanobis Distances
 The section of the project documentation that describes Mahalanobis Distances.

set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters:  **paramsdict
Estimator parameters.
Returns:  selfobject
Estimator instance.

set_threshold
(threshold)¶ Sets the threshold of the metric learner to the given value
threshold
.See more in the User Guide.
Parameters:  thresholdfloat
The threshold value we want to set. It is the value to which the predicted distance for test pairs will be compared. If they are superior to the threshold they will be classified as similar (+1), and dissimilar (1) if not.
Returns:  self
_PairsClassifier
The pairs classifier with the new threshold set.

transform
(X)¶ Embeds data points in the learned linear embedding space.
Transforms samples in
X
intoX_embedded
, samples inside a new embedding space such that:X_embedded = X.dot(L.T)
, whereL
is the learned linear transformation (SeeMahalanobisMixin
).Parameters:  X
numpy.ndarray
, shape=(n_samples, n_features) The data points to embed.
Returns:  X_embedded
numpy.ndarray
, shape=(n_samples, n_components) The embedded data points.
 X