`metric_learn`.LSML

class metric_learn.LSML(tol=0.001, max_iter=1000, prior='identity', verbose=False, preprocessor=None, random_state=None)[source]

Least Squared-residual Metric Learning (LSML)

LSML proposes a simple, yet effective, algorithm that minimizes a convex objective function corresponding to the sum of squared residuals of constraints. This algorithm uses the constraints in the form of the relative distance comparisons, such method is especially useful where pairwise constraints are not natural to obtain, thus pairwise constraints based algorithms become infeasible to be deployed. Furthermore, its sparsity extension leads to more stable estimation when the dimension is high and only a small amount of constraints is given.

See also

metric_learn.LSML: The original weakly-supervised algorithm
Supervised versions of weakly-supervised algorithms: The section of the project documentation that describes the supervised version of weakly supervised estimators.

References

[1]

Liu et al. Metric Learning from Relative Comparisons by Minimizing Squared Residual. ICDM 2012.

[2]

Code adapted from https://gist.github.com/kcarnold/5439917

Examples

>>> from metric_learn import LSML
>>> quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
>>>                [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
>>>                [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
>>>                [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]
>>> # we want to make closer points where the first feature is close, and
>>> # further if the second feature is close
>>> lsml = LSML()
>>> lsml.fit(quadruplets)

Attributes:

n_iter_int: The number of iterations the solver has run.
components_numpy.ndarray, shape=(n_features, n_features): The linear transformation L deduced from the learned Mahalanobis metric (See function components_from_metric.)

Methods

`decision_function`(quadruplets)	Predicts differences between sample distances in input quadruplets.
`fit`(quadruplets[, weights])	Learn the LSML model.
`get_mahalanobis_matrix`()	Returns a copy of the Mahalanobis matrix learned by the metric learner.
`get_metadata_routing`()	Get metadata routing of this object.
`get_metric`()	Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points.
`get_params`([deep])	Get parameters for this estimator.
`pair_distance`(pairs)	Returns the learned Mahalanobis distance between pairs.
`pair_score`(pairs)	Returns the opposite of the learned Mahalanobis distance between pairs.
`predict`(quadruplets)	Predicts the ordering between sample distances in input quadruplets.
`score`(quadruplets)	Computes score on input quadruplets
`score_pairs`(pairs)	Returns the learned Mahalanobis distance between pairs.
`set_decision_function_request`(*[, quadruplets])	Request metadata passed to the `decision_function` method.
`set_fit_request`(*[, quadruplets, weights])	Request metadata passed to the `fit` method.
`set_params`(**params)	Set the parameters of this estimator.
`set_predict_request`(*[, quadruplets])	Request metadata passed to the `predict` method.
`set_score_request`(*[, quadruplets])	Request metadata passed to the `score` method.
`transform`(X)	Embeds data points in the learned linear embedding space.

__init__(tol=0.001, max_iter=1000, prior='identity', verbose=False, preprocessor=None, random_state=None)

classes_ = array([0, 1])

decision_function(quadruplets)

Predicts differences between sample distances in input quadruplets.

For each quadruplet in the samples, computes the difference between the learned metric of the second pair minus the learned metric of the first pair. The higher it is, the more probable it is that the pairs in the quadruplet are presented in the right order, i.e. that the label of the quadruplet is 1. The lower it is, the more probable it is that the label of the quadruplet is -1.

Parameters:

quadrupletsarray-like, shape=(n_quadruplets, 4, n_features) or (n_quadruplets, 4): 3D Array of quadruplets to predict, with each row corresponding to four points, or 2D array of indices of quadruplets if the metric learner uses a preprocessor.

Returns:

decision_functionnumpy.ndarray of floats, shape=(n_constraints,): Metric differences.

fit(quadruplets, weights=None)[source]

Learn the LSML model.

Parameters:

quadrupletsarray-like, shape=(n_constraints, 4, n_features) or (n_constraints, 4): 3D array-like of quadruplets of points or 2D array of quadruplets of indicators. In order to supervise the algorithm in the right way, we should have the four samples ordered in a way such that: d(pairs[i, 0],X[i, 1]) < d(X[i, 2], X[i, 3]) for all 0 <= i < n_constraints.
weights(n_constraints,) array of floats, optional: scale factor for each constraint

Returns:

selfobject: Returns the instance.

get_mahalanobis_matrix()

Returns a copy of the Mahalanobis matrix learned by the metric learner.

Returns:

Mnumpy.ndarray, shape=(n_features, n_features): The copy of the learned Mahalanobis matrix.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_metric()

Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points. Depending on the algorithm, it can return a distance or a similarity function between pairs.

This function will be independent from the metric learner that learned it (it will not be modified if the initial metric learner is modified), and it can be directly plugged into the metric argument of scikit-learn’s estimators.

Returns:

metric_funfunction: The function described above.

See also

pair_distance: a method that returns the distance between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.
pair_score: a method that returns the similarity score between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.

Examples

>>> from metric_learn import NCA
>>> from sklearn.datasets import make_classification
>>> from sklearn.neighbors import KNeighborsClassifier
>>> nca = NCA()
>>> X, y = make_classification()
>>> nca.fit(X, y)
>>> knn = KNeighborsClassifier(metric=nca.get_metric())
>>> knn.fit(X, y) 
KNeighborsClassifier(algorithm='auto', leaf_size=30,
  metric=<function MahalanobisMixin.get_metric.<locals>.metric_fun
          at 0x...>,
  metric_params=None, n_jobs=None, n_neighbors=5, p=2,
  weights='uniform')

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

pair_distance(pairs)

Returns the learned Mahalanobis distance between pairs.

This distance is defined as: $d_M(x, x') = \sqrt{(x-x')^T M (x-x')}$ where M is the learned Mahalanobis matrix, for every pair of points x and x'. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: $d_M(x, x') = \sqrt{(x_e - x_e')^T (x_e- x_e')}$, with $x_e = L x$ (See MahalanobisMixin).

Parameters:

pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2): 3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.

Returns:

scoresnumpy.ndarray of shape=(n_pairs,): The learned Mahalanobis distance for every pair.

See also

get_metric: a method that returns a function to compute the metric between two points. The difference with pair_distance is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
Mahalanobis Distances: The section of the project documentation that describes Mahalanobis Distances.

pair_score(pairs)

Returns the opposite of the learned Mahalanobis distance between pairs.

Parameters:

pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2): 3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.

Returns:

scoresnumpy.ndarray of shape=(n_pairs,): The opposite of the learned Mahalanobis distance for every pair.

See also

get_metric: a method that returns a function to compute the metric between two points. The difference with pair_score is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
Mahalanobis Distances: The section of the project documentation that describes Mahalanobis Distances.

predict(quadruplets)

Predicts the ordering between sample distances in input quadruplets.

For each quadruplet, returns 1 if the quadruplet is in the right order ( first pair is more similar than second pair), and -1 if not.

Parameters:

quadrupletsarray-like, shape=(n_quadruplets, 4, n_features) or (n_quadruplets, 4): 3D Array of quadruplets to predict, with each row corresponding to four points, or 2D array of indices of quadruplets if the metric learner uses a preprocessor.

Returns:

predictionnumpy.ndarray of floats, shape=(n_constraints,): Predictions of the ordering of pairs, for each quadruplet.

score(quadruplets)

Computes score on input quadruplets

Returns the accuracy score of the following classification task: a record is correctly classified if the predicted similarity between the first two samples is higher than that of the last two.

Parameters:

quadrupletsarray-like, shape=(n_quadruplets, 4, n_features) or (n_quadruplets, 4): 3D Array of quadruplets to score, with each row corresponding to four points, or 2D array of indices of quadruplets if the metric learner uses a preprocessor.

Returns:

scorefloat: The quadruplets score.

score_pairs(pairs)

Returns the learned Mahalanobis distance between pairs.

This distance is defined as: $d_M(x, x') = \\sqrt{(x-x')^T M (x-x')}$ where M is the learned Mahalanobis matrix, for every pair of points x and x'. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: $d_M(x, x') = \\sqrt{(x_e - x_e')^T (x_e- x_e')}$, with $x_e = L x$ (See MahalanobisMixin).

Deprecated since version 0.7.0: Please use pair_distance instead.

Warning

This method will be removed in 0.8.0. Please refer to pair_distance or pair_score. This change will occur in order to add learners that don’t necessarily learn a Mahalanobis distance.

Parameters:

pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2): 3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.

Returns:

scoresnumpy.ndarray of shape=(n_pairs,): The learned Mahalanobis distance for every pair.

See also

get_metric: a method that returns a function to compute the metric between two points. The difference with score_pairs is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
Mahalanobis Distances: The section of the project documentation that describes Mahalanobis Distances.

set_decision_function_request(*, quadruplets: bool | None | str = '$UNCHANGED$') → LSML

Request metadata passed to the decision_function method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to decision_function if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to decision_function.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

quadrupletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for quadruplets parameter in decision_function.

Returns:

selfobject: The updated object.

set_fit_request(*, quadruplets: bool | None | str = '$UNCHANGED$', weights: bool | None | str = '$UNCHANGED$') → LSML

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

quadrupletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for quadruplets parameter in fit.
weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for weights parameter in fit.

Returns:

selfobject: The updated object.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_predict_request(*, quadruplets: bool | None | str = '$UNCHANGED$') → LSML

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

quadrupletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for quadruplets parameter in predict.

Returns:

selfobject: The updated object.

set_score_request(*, quadruplets: bool | None | str = '$UNCHANGED$') → LSML

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

quadrupletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for quadruplets parameter in score.

Returns:

selfobject: The updated object.

transform(X)

Embeds data points in the learned linear embedding space.

Transforms samples in X into X_embedded, samples inside a new embedding space such that: X_embedded = X.dot(L.T), where L is the learned linear transformation (See MahalanobisMixin).

Parameters:

Xnumpy.ndarray, shape=(n_samples, n_features): The data points to embed.

Returns:

X_embeddednumpy.ndarray, shape=(n_samples, n_components): The embedded data points.

Examples using `metric_learn.LSML`

Algorithms walkthrough

metric_learn.LSML

Examples using metric_learn.LSML

`metric_learn`.LSML

Examples using `metric_learn.LSML`