metric_learn
.LSML¶

class
metric_learn.
LSML
(tol=0.001, max_iter=1000, prior='identity', verbose=False, preprocessor=None, random_state=None)[source]¶ Least Squaredresidual Metric Learning (LSML)
LSML proposes a simple, yet effective, algorithm that minimizes a convex objective function corresponding to the sum of squared residuals of constraints. This algorithm uses the constraints in the form of the relative distance comparisons, such method is especially useful where pairwise constraints are not natural to obtain, thus pairwise constraints based algorithms become infeasible to be deployed. Furthermore, its sparsity extension leads to more stable estimation when the dimension is high and only a small amount of constraints is given.
Read more in the User Guide.
Parameters:  priorstring or numpy array, optional (default=’identity’)
Prior to set for the metric. Possible options are ‘identity’, ‘covariance’, ‘random’, and a numpy array of shape (n_features, n_features). For LSML, the prior should be strictly positive definite (PD).
 ‘identity’
An identity matrix of shape (n_features, n_features).
 ‘covariance’
The inverse covariance matrix.
 ‘random’
The initial Mahalanobis matrix will be a random positive definite (PD) matrix of shape
(n_features, n_features)
, generated usingsklearn.datasets.make_spd_matrix
. numpy array
A positive definite (PD) matrix of shape (n_features, n_features), that will be used as such to set the prior.
 tolfloat, optional (default=1e3)
Convergence tolerance of the optimization procedure.
 max_iterint, optional (default=1000)
Maximum number of iteration of the optimization procedure.
 verbosebool, optional (default=False)
If True, prints information while learning
 preprocessorarraylike, shape=(n_samples, n_features) or callable
The preprocessor to call to get tuples from indices. If arraylike, tuples will be formed like this: X[indices].
 random_stateint or numpy.RandomState or None, optional (default=None)
A pseudo random number generator object or a seed for it if int. If
init='random'
,random_state
is used to set the random prior.
See also
metric_learn.LSML
 The original weaklysupervised algorithm
 Supervised versions of weaklysupervised algorithms
 The section of the project documentation that describes the supervised version of weakly supervised estimators.
References
[1] Liu et al. Metric Learning from Relative Comparisons by Minimizing Squared Residual. ICDM 2012. [2] Code adapted from https://gist.github.com/kcarnold/5439917 Examples
>>> from metric_learn import LSML >>> quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]], >>> [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]], >>> [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]], >>> [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]] >>> # we want to make closer points where the first feature is close, and >>> # further if the second feature is close >>> lsml = LSML() >>> lsml.fit(quadruplets)
Attributes:  n_iter_
int
The number of iterations the solver has run.
 components_
numpy.ndarray
, shape=(n_features, n_features) The linear transformation
L
deduced from the learned Mahalanobis metric (See functioncomponents_from_metric
.)
Methods
decision_function
(quadruplets)Predicts differences between sample distances in input quadruplets. fit
(quadruplets[, weights])Learn the LSML model. get_mahalanobis_matrix
()Returns a copy of the Mahalanobis matrix learned by the metric learner. get_metric
()Returns a function that takes as input two 1D arrays and outputs the learned metric score on these two points. get_params
([deep])Get parameters for this estimator. predict
(quadruplets)Predicts the ordering between sample distances in input quadruplets. score
(quadruplets)Computes score on input quadruplets score_pairs
(pairs)Returns the learned Mahalanobis distance between pairs. set_params
(**params)Set the parameters of this estimator. transform
(X)Embeds data points in the learned linear embedding space. 
__init__
(tol=0.001, max_iter=1000, prior='identity', verbose=False, preprocessor=None, random_state=None)¶ Initialize self. See help(type(self)) for accurate signature.

decision_function
(quadruplets)¶ Predicts differences between sample distances in input quadruplets.
For each quadruplet in the samples, computes the difference between the learned metric of the second pair minus the learned metric of the first pair. The higher it is, the more probable it is that the pairs in the quadruplet are presented in the right order, i.e. that the label of the quadruplet is 1. The lower it is, the more probable it is that the label of the quadruplet is 1.
Parameters:  quadrupletsarraylike, shape=(n_quadruplets, 4, n_features) or (n_quadruplets, 4)
3D Array of quadruplets to predict, with each row corresponding to four points, or 2D array of indices of quadruplets if the metric learner uses a preprocessor.
Returns:  decision_function
numpy.ndarray
of floats, shape=(n_constraints,) Metric differences.

fit
(quadruplets, weights=None)[source]¶ Learn the LSML model.
Parameters:  quadrupletsarraylike, shape=(n_constraints, 4, n_features) or (n_constraints, 4)
3D arraylike of quadruplets of points or 2D array of quadruplets of indicators. In order to supervise the algorithm in the right way, we should have the four samples ordered in a way such that: d(pairs[i, 0],X[i, 1]) < d(X[i, 2], X[i, 3]) for all 0 <= i < n_constraints.
 weights(n_constraints,) array of floats, optional
scale factor for each constraint
Returns:  selfobject
Returns the instance.

get_mahalanobis_matrix
()¶ Returns a copy of the Mahalanobis matrix learned by the metric learner.
Returns:  M
numpy.ndarray
, shape=(n_features, n_features) The copy of the learned Mahalanobis matrix.
 M

get_metric
()¶ Returns a function that takes as input two 1D arrays and outputs the learned metric score on these two points.
This function will be independent from the metric learner that learned it (it will not be modified if the initial metric learner is modified), and it can be directly plugged into the
metric
argument of scikitlearn’s estimators.Returns:  metric_funfunction
The function described above.
See also
score_pairs
 a method that returns the metric score between several pairs of points. Unlike
get_metric
, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.
Examples
>>> from metric_learn import NCA >>> from sklearn.datasets import make_classification >>> from sklearn.neighbors import KNeighborsClassifier >>> nca = NCA() >>> X, y = make_classification() >>> nca.fit(X, y) >>> knn = KNeighborsClassifier(metric=nca.get_metric()) >>> knn.fit(X, y) KNeighborsClassifier(algorithm='auto', leaf_size=30, metric=<function MahalanobisMixin.get_metric.<locals>.metric_fun at 0x...>, metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform')

get_params
(deep=True)¶ Get parameters for this estimator.
Parameters:  deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  paramsmapping of string to any
Parameter names mapped to their values.

predict
(quadruplets)¶ Predicts the ordering between sample distances in input quadruplets.
For each quadruplet, returns 1 if the quadruplet is in the right order ( first pair is more similar than second pair), and 1 if not.
Parameters:  quadrupletsarraylike, shape=(n_quadruplets, 4, n_features) or (n_quadruplets, 4)
3D Array of quadruplets to predict, with each row corresponding to four points, or 2D array of indices of quadruplets if the metric learner uses a preprocessor.
Returns:  prediction
numpy.ndarray
of floats, shape=(n_constraints,) Predictions of the ordering of pairs, for each quadruplet.

score
(quadruplets)¶ Computes score on input quadruplets
Returns the accuracy score of the following classification task: a record is correctly classified if the predicted similarity between the first two samples is higher than that of the last two.
Parameters:  quadrupletsarraylike, shape=(n_quadruplets, 4, n_features) or (n_quadruplets, 4)
3D Array of quadruplets to score, with each row corresponding to four points, or 2D array of indices of quadruplets if the metric learner uses a preprocessor.
Returns:  scorefloat
The quadruplets score.

score_pairs
(pairs)¶ Returns the learned Mahalanobis distance between pairs.
This distance is defined as: \(d_M(x, x') = \sqrt{(xx')^T M (xx')}\) where
M
is the learned Mahalanobis matrix, for every pair of pointsx
andx'
. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: \(d_M(x, x') = \sqrt{(x_e  x_e')^T (x_e x_e')}\), with \(x_e = L x\) (SeeMahalanobisMixin
).Parameters:  pairsarraylike, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
Returns:  scores
numpy.ndarray
of shape=(n_pairs,) The learned Mahalanobis distance for every pair.
See also
get_metric
 a method that returns a function to compute the metric between two points. The difference with
score_pairs
is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.  Mahalanobis Distances
 The section of the project documentation that describes Mahalanobis Distances.

set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters:  **paramsdict
Estimator parameters.
Returns:  selfobject
Estimator instance.

transform
(X)¶ Embeds data points in the learned linear embedding space.
Transforms samples in
X
intoX_embedded
, samples inside a new embedding space such that:X_embedded = X.dot(L.T)
, whereL
is the learned linear transformation (SeeMahalanobisMixin
).Parameters:  X
numpy.ndarray
, shape=(n_samples, n_features) The data points to embed.
Returns:  X_embedded
numpy.ndarray
, shape=(n_samples, n_components) The embedded data points.
 X