metric_learn.SCML_Supervised

class metric_learn.SCML_Supervised(k_genuine=3, k_impostor=10, beta=1e-05, basis='lda', n_basis=None, gamma=0.005, max_iter=10000, output_iter=500, batch_size=10, verbose=False, preprocessor=None, random_state=None)[source]

Supervised version of Sparse Compositional Metric Learning (SCML)

SCML_Supervised creates triplets by taking k_genuine neighbours of the same class and k_impostor neighbours from different classes for each point and then runs the SCML algorithm on these triplets.

Read more in the User Guide.

Warning

SCML is still a bit experimental, don’t hesitate to report if something fails/doesn’t work as expected.

Parameters:
beta: float (default=1e-5)

L1 regularization parameter.

basisstring or an array-like, optional (default=’lda’)

Set of bases to construct the metric. Possible options are ‘lda’, and an array-like of shape (n_basis, n_features).

‘lda’

The n_basis basis set is constructed from the LDA of significant local regions in the feature space via clustering, for each region center k-nearest neighbors are used to obtain the LDA scalings, which correspond to the locally discriminative basis.

array-like

A matrix of shape (n_basis, n_features), that will be used as the basis set for the metric construction.

n_basisint, optional

Number of basis to be yielded. In case it is not set it will be set based on basis. If no value is selected a default will be computed based on the input.

gamma: float (default = 5e-3)

Learning rate for the optimization algorithm.

max_iterint (default = 100000)

Number of iterations for the algorithm.

output_iterint (default = 5000)

Number of iterations to check current weights performance and output this information in case verbose is True.

verbosebool, optional

If True, prints information while learning.

preprocessorarray-like, shape=(n_samples, n_features) or callable

The preprocessor to call to get triplets from indices. If array-like, triplets will be formed like this: X[indices].

random_stateint or numpy.RandomState or None, optional (default=None)

A pseudo random number generator object or a seed for it if int.

See also

metric_learn.SCML

The weakly supervised version of this algorithm.

References

[1]

Y. Shi, A. Bellet and F. Sha. Sparse Compositional Metric Learning.. (AAAI), 2014.

[2]

Adapted from original Matlab implementation..

Examples

>>> from metric_learn import SCML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> scml = SCML_Supervised(random_state=33)
>>> scml.fit(X, Y)
SCML_Supervised(random_state=33)
>>> scml.score_pairs([[X[0], X[1]], [X[0], X[2]]])
array([1.84640733, 1.55984363])
>>> scml.get_metric()(X[0], X[1])
1.8464073327922157
Attributes:
components_numpy.ndarray, shape=(n_features, n_features)

The linear transformation L deduced from the learned Mahalanobis metric (See function _components_from_basis_weights.)

Methods

fit(X, y)

Create constraints from labels and learn the SCML model.

fit_transform(X[, y])

Fit to data, then transform it.

get_mahalanobis_matrix()

Returns a copy of the Mahalanobis matrix learned by the metric learner.

get_metadata_routing()

Get metadata routing of this object.

get_metric()

Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points.

get_params([deep])

Get parameters for this estimator.

pair_distance(pairs)

Returns the learned Mahalanobis distance between pairs.

pair_score(pairs)

Returns the opposite of the learned Mahalanobis distance between pairs.

score_pairs(pairs)

Returns the learned Mahalanobis distance between pairs.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Embeds data points in the learned linear embedding space.

__init__(k_genuine=3, k_impostor=10, beta=1e-05, basis='lda', n_basis=None, gamma=0.005, max_iter=10000, output_iter=500, batch_size=10, verbose=False, preprocessor=None, random_state=None)[source]
fit(X, y)[source]

Create constraints from labels and learn the SCML model.

Parameters:
X(n x d) matrix

Input data, where each row corresponds to a single instance.

y(n) array-like

Data labels.

Returns:
selfobject

Returns the instance.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_mahalanobis_matrix()

Returns a copy of the Mahalanobis matrix learned by the metric learner.

Returns:
Mnumpy.ndarray, shape=(n_features, n_features)

The copy of the learned Mahalanobis matrix.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_metric()

Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points. Depending on the algorithm, it can return a distance or a similarity function between pairs.

This function will be independent from the metric learner that learned it (it will not be modified if the initial metric learner is modified), and it can be directly plugged into the metric argument of scikit-learn’s estimators.

Returns:
metric_funfunction

The function described above.

See also

pair_distance

a method that returns the distance between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.

pair_score

a method that returns the similarity score between several pairs of points. Unlike get_metric, this is a method of the metric learner and therefore can change if the metric learner changes. Besides, it can use the metric learner’s preprocessor, and works on concatenated arrays.

Examples

>>> from metric_learn import NCA
>>> from sklearn.datasets import make_classification
>>> from sklearn.neighbors import KNeighborsClassifier
>>> nca = NCA()
>>> X, y = make_classification()
>>> nca.fit(X, y)
>>> knn = KNeighborsClassifier(metric=nca.get_metric())
>>> knn.fit(X, y) 
KNeighborsClassifier(algorithm='auto', leaf_size=30,
  metric=<function MahalanobisMixin.get_metric.<locals>.metric_fun
          at 0x...>,
  metric_params=None, n_jobs=None, n_neighbors=5, p=2,
  weights='uniform')
get_params(deep=True)

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

pair_distance(pairs)

Returns the learned Mahalanobis distance between pairs.

This distance is defined as: \(d_M(x, x') = \sqrt{(x-x')^T M (x-x')}\) where M is the learned Mahalanobis matrix, for every pair of points x and x'. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: \(d_M(x, x') = \sqrt{(x_e - x_e')^T (x_e- x_e')}\), with \(x_e = L x\) (See MahalanobisMixin).

Parameters:
pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)

3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.

Returns:
scoresnumpy.ndarray of shape=(n_pairs,)

The learned Mahalanobis distance for every pair.

See also

get_metric

a method that returns a function to compute the metric between two points. The difference with pair_distance is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.

Mahalanobis Distances

The section of the project documentation that describes Mahalanobis Distances.

pair_score(pairs)

Returns the opposite of the learned Mahalanobis distance between pairs.

Parameters:
pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)

3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.

Returns:
scoresnumpy.ndarray of shape=(n_pairs,)

The opposite of the learned Mahalanobis distance for every pair.

See also

get_metric

a method that returns a function to compute the metric between two points. The difference with pair_score is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.

Mahalanobis Distances

The section of the project documentation that describes Mahalanobis Distances.

score_pairs(pairs)

Returns the learned Mahalanobis distance between pairs.

This distance is defined as: \(d_M(x, x') = \\sqrt{(x-x')^T M (x-x')}\) where M is the learned Mahalanobis matrix, for every pair of points x and x'. This corresponds to the euclidean distance between embeddings of the points in a new space, obtained through a linear transformation. Indeed, we have also: \(d_M(x, x') = \\sqrt{(x_e - x_e')^T (x_e- x_e')}\), with \(x_e = L x\) (See MahalanobisMixin).

Deprecated since version 0.7.0: Please use pair_distance instead.

Warning

This method will be removed in 0.8.0. Please refer to pair_distance or pair_score. This change will occur in order to add learners that don’t necessarily learn a Mahalanobis distance.

Parameters:
pairsarray-like, shape=(n_pairs, 2, n_features) or (n_pairs, 2)

3D Array of pairs to score, with each row corresponding to two points, for 2D array of indices of pairs if the metric learner uses a preprocessor.

Returns:
scoresnumpy.ndarray of shape=(n_pairs,)

The learned Mahalanobis distance for every pair.

See also

get_metric

a method that returns a function to compute the metric between two points. The difference with score_pairs is that it works on two 1D arrays and cannot use a preprocessor. Besides, the returned function is independent of the metric learner and hence is not modified if the metric learner is.

Mahalanobis Distances

The section of the project documentation that describes Mahalanobis Distances.

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • None: Transform configuration is unchanged

Returns:
selfestimator instance

Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)

Embeds data points in the learned linear embedding space.

Transforms samples in X into X_embedded, samples inside a new embedding space such that: X_embedded = X.dot(L.T), where L is the learned linear transformation (See MahalanobisMixin).

Parameters:
Xnumpy.ndarray, shape=(n_samples, n_features)

The data points to embed.

Returns:
X_embeddednumpy.ndarray, shape=(n_samples, n_components)

The embedded data points.