imblearn.combine
.SMOTEENN¶

class
imblearn.combine.
SMOTEENN
(ratio='auto', random_state=None, smote=None, enn=None, k=None, m=None, out_step=None, kind_smote=None, size_ngh=None, n_neighbors=None, kind_enn=None, n_jobs=None)[source][source]¶ Class to perform oversampling using SMOTE and cleaning using ENN.
Combine over and undersampling using SMOTE and Edited Nearest Neighbours.
Read more in the User Guide.
Parameters: ratio : str, dict, or callable, optional (default=’auto’)
Ratio to use for resampling the data set.
 If
str
, has to be one of: (i)'minority'
: resample the minority class; (ii)'majority'
: resample the majority class, (iii)'not minority'
: resample all classes apart of the minority class, (iv)'all'
: resample all classes, and (v)'auto'
: correspond to'all'
with for oversampling methods and'not minority'
for undersampling methods. The classes targeted will be oversampled or undersampled to achieve an equal number of sample with the majority or minority class.  If
dict
, the keys correspond to the targeted classes. The values correspond to the desired number of samples.  If callable, function taking
y
and returns adict
. The keys correspond to the targeted classes. The values correspond to the desired number of samples.
random_state : int, RandomState instance or None, optional (default=None)
If int,
random_state
is the seed used by the random number generator; IfRandomState
instance, random_state is the random number generator; IfNone
, the random number generator is theRandomState
instance used bynp.random
.smote : object, optional (default=SMOTE())
The
imblearn.over_sampling.SMOTE
object to use. If not given, aimblearn.over_sampling.SMOTE
object with default parameters will be given.enn : object, optional (default=EditedNearestNeighbours())
The
imblearn.under_sampling.EditedNearestNeighbours
object to use. If not given, animblearn.under_sampling.EditedNearestNeighbours
object with default parameters will be given.k : int, optional (default=None)
Number of nearest neighbours to used to construct synthetic samples.
Deprecated since version 0.2: k is deprecated from 0.2 and will be replaced in 0.4 Give directly a
imblearn.over_sampling.SMOTE
object.m : int, optional (default=None)
Number of nearest neighbours to use to determine if a minority sample is in danger.
Deprecated since version 0.2: m is deprecated from 0.2 and will be replaced in 0.4 Give directly a
imblearn.over_sampling.SMOTE
object.out_step : float, optional (default=None)
Step size when extrapolating.
Deprecated since version 0.2:
out_step
is deprecated from 0.2 and will be replaced in 0.4 Give directly aimblearn.over_sampling.SMOTE
object.kind_smote : str, optional (default=None)
The type of SMOTE algorithm to use one of the following options:
'regular'
,'borderline1'
,'borderline2'
,'svm'
.Deprecated since version 0.2: kind_smote is deprecated from 0.2 and will be replaced in 0.4 Give directly a
imblearn.over_sampling.SMOTE
object.size_ngh : int, optional (default=None)
Size of the neighbourhood to consider to compute the average distance to the minority point samples.
Deprecated since version 0.2: size_ngh is deprecated from 0.2 and will be replaced in 0.4 Use
n_neighbors
instead.n_neighbors : int, optional (default=None)
Size of the neighbourhood to consider to compute the average distance to the minority point samples.
Deprecated since version 0.2: n_neighbors is deprecated from 0.2 and will be replaced in 0.4 Give directly a
imblearn.under_sampling.EditedNearestNeighbours
object.kind_sel : str, optional (default=None)
Strategy to use in order to exclude samples.
 If
'all'
, all neighbours will have to agree with the samples of interest to not be excluded.  If
'mode'
, the majority vote of the neighbours will be used in order to exclude a sample.
Deprecated since version 0.2:
kind_sel
is deprecated from 0.2 and will be replaced in 0.4 Give directly aimblearn.under_sampling.EditedNearestNeighbours
object.n_jobs : int, optional (default=None)
The number of threads to open if possible.
Deprecated since version 0.2: n_jobs is deprecated from 0.2 and will be replaced in 0.4 Give directly a
imblearn.over_sampling.SMOTE
andimblearn.under_sampling.EditedNearestNeighbours
object.See also
SMOTETomek
 Oversample using SMOTE followed by undersampling removing the Tomek’s links.
Notes
The method is presented in [R5151].
Supports mutliclass resampling. Refer to SMOTE and ENN regarding the scheme which used.
See SMOTE + ENN and Comparison of the combination of over and undersampling algorithms.
References
[R5151] (1, 2) G. Batista, R. C. Prati, M. C. Monard. “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter 6 (1), 2029, 2004. Examples
>>> from collections import Counter >>> from sklearn.datasets import make_classification >>> from imblearn.combine import SMOTEENN >>> X, y = make_classification(n_classes=2, class_sep=2, ... weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0, ... n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10) >>> print('Original dataset shape {}'.format(Counter(y))) Original dataset shape Counter({1: 900, 0: 100}) >>> sme = SMOTEENN(random_state=42) >>> X_res, y_res = sme.fit_sample(X, y) >>> print('Resampled dataset shape {}'.format(Counter(y_res))) Resampled dataset shape Counter({0: 900, 1: 881})

__init__
(ratio='auto', random_state=None, smote=None, enn=None, k=None, m=None, out_step=None, kind_smote=None, size_ngh=None, n_neighbors=None, kind_enn=None, n_jobs=None)[source][source]¶

fit
(X, y)[source][source]¶ Find the classes statistics before to perform sampling.
Parameters: X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
y : arraylike, shape (n_samples,)
Corresponding label for each sample in X.
Returns: self : object,
Return self.

fit_sample
(X, y)[source]¶ Fit the statistics and resample the data directly.
Parameters: X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
y : arraylike, shape (n_samples,)
Corresponding label for each sample in X.
Returns: X_resampled : {arraylike, sparse matrix}, shape (n_samples_new, n_features)
The array containing the resampled data.
y_resampled : arraylike, shape (n_samples_new,)
The corresponding label of X_resampled

get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.

sample
(X, y)[source]¶ Resample the dataset.
Parameters: X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
y : arraylike, shape (n_samples,)
Corresponding label for each sample in X.
Returns: X_resampled : {ndarray, sparse matrix}, shape (n_samples_new, n_features)
The array containing the resampled data.
y_resampled : ndarray, shape (n_samples_new)
The corresponding label of X_resampled
 If