imblearn.over_sampling
.SMOTE¶

class
imblearn.over_sampling.
SMOTE
(ratio='auto', random_state=None, k=None, k_neighbors=5, m=None, m_neighbors=10, out_step=0.5, kind='regular', svm_estimator=None, n_jobs=1)[source][source]¶ Class to perform oversampling using SMOTE.
This object is an implementation of SMOTE  Synthetic Minority Oversampling Technique, and the variants Borderline SMOTE 1, 2 and SVMSMOTE.
Read more in the User Guide.
Parameters: ratio : str, dict, or callable, optional (default=’auto’)
Ratio to use for resampling the data set.
 If
str
, has to be one of: (i)'minority'
: resample the minority class; (ii)'majority'
: resample the majority class, (iii)'not minority'
: resample all classes apart of the minority class, (iv)'all'
: resample all classes, and (v)'auto'
: correspond to'all'
with for oversampling methods and'not minority'
for undersampling methods. The classes targeted will be oversampled or undersampled to achieve an equal number of sample with the majority or minority class.  If
dict
, the keys correspond to the targeted classes. The values correspond to the desired number of samples.  If callable, function taking
y
and returns adict
. The keys correspond to the targeted classes. The values correspond to the desired number of samples.
random_state : int, RandomState instance or None, optional (default=None)
If int,
random_state
is the seed used by the random number generator; IfRandomState
instance, random_state is the random number generator; IfNone
, the random number generator is theRandomState
instance used bynp.random
.k : int, optional (default=None)
Number of nearest neighbours to used to construct synthetic samples.
Deprecated since version 0.2:
k
is deprecated from 0.2 and will be replaced in 0.4 Usek_neighbors
instead.k_neighbors : int or object, optional (default=5)
If
int
, number of nearest neighbours to used to construct synthetic samples. If object, an estimator that inherits fromsklearn.neighbors.base.KNeighborsMixin
that will be used to find the k_neighbors.m : int, optional (default=None)
Number of nearest neighbours to use to determine if a minority sample is in danger. Used with
kind={'borderline1', 'borderline2', 'svm'}
.Deprecated since version 0.2:
m
is deprecated from 0.2 and will be replaced in 0.4 Usem_neighbors
instead.m_neighbors : int int or object, optional (default=10)
If int, number of nearest neighbours to use to determine if a minority sample is in danger. Used with
kind={'borderline1', 'borderline2', 'svm'}
. If object, an estimator that inherits fromsklearn.neighbors.base.KNeighborsMixin
that will be used to find the k_neighbors.out_step : float, optional (default=0.5)
Step size when extrapolating. Used with
kind='svm'
.kind : str, optional (default=’regular’)
The type of SMOTE algorithm to use one of the following options:
'regular'
,'borderline1'
,'borderline2'
,'svm'
.svm_estimator : object, optional (default=SVC())
If
kind='svm'
, a parametrizedsklearn.svm.SVC
classifier can be passed.n_jobs : int, optional (default=1)
The number of threads to open if possible.
See also
ADASYN
 Oversample using ADASYN.
Notes
See the original papers: [R7779], [R7879], [R7979] for more details.
Supports mutliclass resampling. A onevs.rest scheme is used as originally proposed in [R7779].
See Benchmark oversampling methods in a face recognition task, Evaluate classification by compiling a report, Metrics specific to imbalanced learning, Plotting Validation Curves, Comparison of the different oversampling algorithms, and SMOTE.
References
[R7779] (1, 2, 3) N. V. Chawla, K. W. Bowyer, L. O.Hall, W. P. Kegelmeyer, “SMOTE: synthetic minority oversampling technique,” Journal of artificial intelligence research, 321357, 2002. [R7879] (1, 2) H. Han, W. WenYuan, M. BingHuan, “BorderlineSMOTE: a new oversampling method in imbalanced data sets learning,” Advances in intelligent computing, 878887, 2005. [R7979] (1, 2) H. M. Nguyen, E. W. Cooper, K. Kamei, “Borderline oversampling for imbalanced data classification,” International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), pp.421, 2001. Examples
>>> from collections import Counter >>> from sklearn.datasets import make_classification >>> from imblearn.over_sampling import SMOTE >>> X, y = make_classification(n_classes=2, class_sep=2, ... weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0, ... n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10) >>> print('Original dataset shape {}'.format(Counter(y))) Original dataset shape Counter({1: 900, 0: 100}) >>> sm = SMOTE(random_state=42) >>> X_res, y_res = sm.fit_sample(X, y) >>> print('Resampled dataset shape {}'.format(Counter(y_res))) Resampled dataset shape Counter({0: 900, 1: 900})

__init__
(ratio='auto', random_state=None, k=None, k_neighbors=5, m=None, m_neighbors=10, out_step=0.5, kind='regular', svm_estimator=None, n_jobs=1)[source][source]¶

fit
(X, y)[source]¶ Find the classes statistics before to perform sampling.
Parameters: X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
y : arraylike, shape (n_samples,)
Corresponding label for each sample in X.
Returns: self : object,
Return self.

fit_sample
(X, y)[source]¶ Fit the statistics and resample the data directly.
Parameters: X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
y : arraylike, shape (n_samples,)
Corresponding label for each sample in X.
Returns: X_resampled : {arraylike, sparse matrix}, shape (n_samples_new, n_features)
The array containing the resampled data.
y_resampled : arraylike, shape (n_samples_new,)
The corresponding label of X_resampled

get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.

sample
(X, y)[source]¶ Resample the dataset.
Parameters: X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
y : arraylike, shape (n_samples,)
Corresponding label for each sample in X.
Returns: X_resampled : {ndarray, sparse matrix}, shape (n_samples_new, n_features)
The array containing the resampled data.
y_resampled : ndarray, shape (n_samples_new)
The corresponding label of X_resampled
 If