metric_learn
.SCML
- class metric_learn.SCML(beta=1e-05, basis='triplet_diffs', n_basis=None, gamma=0.005, max_iter=10000, output_iter=500, batch_size=10, verbose=False, preprocessor=None, random_state=None)[source]
Sparse Compositional Metric Learning (SCML)
SCML learns an squared Mahalanobis distance from triplet constraints by optimizing sparse positive weights assigned to a set of \(K\) rank-one PSD bases. This can be formulated as an optimization problem with only \(K\) parameters, that can be solved with an efficient stochastic composite scheme.
Read more in the User Guide.
Warning
SCML is still a bit experimental, don’t hesitate to report if something fails/doesn’t work as expected.
- Parameters:
- beta: float (default=1e-5)
L1 regularization parameter.
- basisstring or array-like, optional (default=’triplet_diffs’)
Set of bases to construct the metric. Possible options are ‘triplet_diffs’, and an array-like of shape (n_basis, n_features).
- ‘triplet_diffs’
The basis set is constructed iteratively from differences between points of n_features positive or negative pairs randomly sampled from the triplets constraints. Requires the number of training triplets to be great or equal to n_features.
- array-like
A matrix of shape (n_basis, n_features), that will be used as the basis set for the metric construction.
- n_basisint, optional
Number of basis to be yielded. In case it is not set it will be set based on basis. If no value is selected a default will be computed based on the input.
- gamma: float (default = 5e-3)
Learning rate for the optimization algorithm.
- max_iterint (default = 10000)
Number of iterations for the algorithm.
- output_iterint (default = 5000)
Number of iterations to check current weights performance and output this information in case verbose is True.
- verbosebool, optional
If True, prints information while learning.
- preprocessorarray-like, shape=(n_samples, n_features) or callable
The preprocessor to call to get triplets from indices. If array-like, triplets will be formed like this: X[indices].
- random_stateint or numpy.RandomState or None, optional (default=None)
A pseudo random number generator object or a seed for it if int.
See also
metric_learn.SCML_Supervised
The supervised version of the algorithm.
- Supervised versions of weakly-supervised algorithms
The section of the project documentation that describes the supervised version of weakly supervised estimators.
References
[1]Y. Shi, A. Bellet and F. Sha. Sparse Compositional Metric Learning.. (AAAI), 2014.
[2]Adapted from original Matlab implementation..
Examples
>>> from metric_learn import SCML >>> triplets = [[[1.2, 7.5], [1.3, 1.5], [6.2, 9.7]], >>> [[1.3, 4.5], [3.2, 4.6], [5.4, 5.4]], >>> [[3.2, 7.5], [3.3, 1.5], [8.2, 9.7]], >>> [[3.3, 4.5], [5.2, 4.6], [7.4, 5.4]]] >>> scml = SCML() >>> scml.fit(triplets)
- Attributes:
- components_numpy.ndarray, shape=(n_features, n_features)
The linear transformation
L
deduced from the learned Mahalanobis metric (See function _components_from_basis_weights.)
Methods
decision_function
(triplets)Predicts differences between sample distances in input triplets.
fit
(triplets)Learn the SCML model.
Returns a copy of the Mahalanobis matrix learned by the metric learner.
Get metadata routing of this object.
Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points.
get_params
([deep])Get parameters for this estimator.
pair_distance
(pairs)Returns the learned Mahalanobis distance between pairs.
pair_score
(pairs)Returns the opposite of the learned Mahalanobis distance between pairs.
predict
(triplets)Predicts the ordering between sample distances in input triplets.
score
(triplets)Computes score on input triplets.
score_pairs
(pairs)Returns the learned Mahalanobis distance between pairs.
set_decision_function_request
(*[, triplets])Request metadata passed to the
decision_function
method.set_fit_request
(*[, triplets])Request metadata passed to the
fit
method.set_params
(**params)Set the parameters of this estimator.
set_predict_request
(*[, triplets])Request metadata passed to the
predict
method.set_score_request
(*[, triplets])Request metadata passed to the
score
method.transform
(X)Embeds data points in the learned linear embedding space.
- __init__(beta=1e-05, basis='triplet_diffs', n_basis=None, gamma=0.005, max_iter=10000, output_iter=500, batch_size=10, verbose=False, preprocessor=None, random_state=None)
- classes_ = array([0, 1])
- decision_function(triplets)
Predicts differences between sample distances in input triplets.
For each triplet (X_a, X_b, X_c) in the samples, computes the difference between the learned distance of the second pair (X_a, X_c) minus the learned distance of the first pair (X_a, X_b). The higher it is, the more probable it is that the pairs in the triplets are presented in the right order, i.e. that the label of the triplet is 1. The lower it is, the more probable it is that the label of the triplet is -1.
- Parameters:
- tripletarray-like, shape=(n_triplets, 3, n_features) or (n_triplets, 3)
3D array of triplets to predict, with each row corresponding to three points, or 2D array of indices of triplets if the metric learner uses a preprocessor.
- Returns:
- decision_functionnumpy.ndarray of floats, shape=(n_constraints,)
Metric differences.
- fit(triplets)[source]
Learn the SCML model.
- Parameters:
- tripletsarray-like, shape=(n_constraints, 3, n_features) or (n_constraints, 3)
3D array-like of triplets of points or 2D array of triplets of indicators. Triplets are assumed to be ordered such that: d(triplets[i, 0],triplets[i, 1]) < d(triplets[i, 0], triplets[i, 2]).
- Returns:
- selfobject
Returns the instance.
- get_mahalanobis_matrix()
Returns a copy of the Mahalanobis matrix learned by the metric learner.
- Returns:
- Mnumpy.ndarray, shape=(n_features, n_features)
The copy of the learned Mahalanobis matrix.
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_metric()
Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points. Depending on the algorithm, it can return a distance or a similarity function between pairs.
This function will be independent from the metric learner that learned it (it will not be modified if the initial metric learner is modified), and it can be directly plugged into the metric argument of scikit-learn’s estimators.
- Returns:
- metric_funfunction
The function described above.
See also
pair_distance
a method that returns the distance between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.
pair_score
a method that returns the similarity score between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.
Examples
>>> from metric_learn import NCA >>> from sklearn.datasets import make_classification >>> from sklearn.neighbors import KNeighborsClassifier >>> nca = NCA() >>> X, y = make_classification() >>> nca.fit(X, y) >>> knn = KNeighborsClassifier(metric=nca.get_metric()) >>> knn.fit(X, y) KNeighborsClassifier(algorithm='auto', leaf_size=30, metric=<function MahalanobisMixin.get_metric.<locals>.metric_fun at 0x...>, metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform')
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- pair_distance(pairs)
Returns the learned Mahalanobis distance between pairs.
This distance is defined as: \(d_M(x, x') = \sqrt{(x-x')^T M (x-x')}\) where
M
is the learned Mahalanobis matrix, for every pair of pointsx
andx'
. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: \(d_M(x, x') = \sqrt{(x_e - x_e')^T (x_e- x_e')}\), with \(x_e = L x\) (SeeMahalanobisMixin
).- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- scoresnumpy.ndarray of shape=(n_pairs,)
The learned Mahalanobis distance for every pair.
See also
get_metric
a method that returns a function to compute the metric between two points. The difference with pair_distance is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
- Mahalanobis Distances
The section of the project documentation that describes Mahalanobis Distances.
- pair_score(pairs)
Returns the opposite of the learned Mahalanobis distance between pairs.
- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- scoresnumpy.ndarray of shape=(n_pairs,)
The opposite of the learned Mahalanobis distance for every pair.
See also
get_metric
a method that returns a function to compute the metric between two points. The difference with pair_score is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
- Mahalanobis Distances
The section of the project documentation that describes Mahalanobis Distances.
- predict(triplets)
Predicts the ordering between sample distances in input triplets.
For each triplets, returns 1 if the first element is closer to the second than to the last and -1 if not.
- Parameters:
- tripletsarray-like, shape=(n_triplets, 3, n_features) or (n_triplets, 3)
3D array of triplets to predict, with each row corresponding to three points, or 2D array of indices of triplets if the metric learner uses a preprocessor.
- Returns:
- predictionnumpy.ndarray of floats, shape=(n_constraints,)
Predictions of the ordering of pairs, for each triplet.
- score(triplets)
Computes score on input triplets.
Returns the accuracy score of the following classification task: a triplet (X_a, X_b, X_c) is correctly classified if the predicted similarity between the first pair (X_a, X_b) is higher than that of the second pair (X_a, X_c)
- Parameters:
- tripletsarray-like, shape=(n_triplets, 3, n_features) or (n_triplets, 3)
3D array of triplets to score, with each row corresponding to three points, or 2D array of indices of triplets if the metric learner uses a preprocessor.
- Returns:
- scorefloat
The triplets score.
- score_pairs(pairs)
Returns the learned Mahalanobis distance between pairs.
This distance is defined as: \(d_M(x, x') = \\sqrt{(x-x')^T M (x-x')}\) where
M
is the learned Mahalanobis matrix, for every pair of pointsx
andx'
. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: \(d_M(x, x') = \\sqrt{(x_e - x_e')^T (x_e- x_e')}\), with \(x_e = L x\) (SeeMahalanobisMixin
).Deprecated since version 0.7.0: Please use pair_distance instead.
Warning
This method will be removed in 0.8.0. Please refer to pair_distance or pair_score. This change will occur in order to add learners that don’t necessarily learn a Mahalanobis distance.
- Parameters:
- pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)
3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.
- Returns:
- scoresnumpy.ndarray of shape=(n_pairs,)
The learned Mahalanobis distance for every pair.
See also
get_metric
a method that returns a function to compute the metric between two points. The difference with score_pairs is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.
- Mahalanobis Distances
The section of the project documentation that describes Mahalanobis Distances.
- set_decision_function_request(*, triplets: bool | None | str = '$UNCHANGED$') SCML
Request metadata passed to the
decision_function
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed todecision_function
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it todecision_function
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- tripletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
triplets
parameter indecision_function
.
- Returns:
- selfobject
The updated object.
- set_fit_request(*, triplets: bool | None | str = '$UNCHANGED$') SCML
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- tripletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
triplets
parameter infit
.
- Returns:
- selfobject
The updated object.
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_predict_request(*, triplets: bool | None | str = '$UNCHANGED$') SCML
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- tripletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
triplets
parameter inpredict
.
- Returns:
- selfobject
The updated object.
- set_score_request(*, triplets: bool | None | str = '$UNCHANGED$') SCML
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- tripletsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
triplets
parameter inscore
.
- Returns:
- selfobject
The updated object.
- transform(X)
Embeds data points in the learned linear embedding space.
Transforms samples in
X
intoX_embedded
, samples inside a new embedding space such that:X_embedded = X.dot(L.T)
, whereL
is the learned linear transformation (SeeMahalanobisMixin
).- Parameters:
- Xnumpy.ndarray, shape=(n_samples, n_features)
The data points to embed.
- Returns:
- X_embeddednumpy.ndarray, shape=(n_samples, n_components)
The embedded data points.