Target Encoder

class category_encoders.target_encoder.TargetEncoder(verbose=0, cols=None, drop_invariant=False, return_df=True, impute_missing=True, handle_unknown='impute', min_samples_leaf=1, smoothing=1)[source]

Target Encode for categorical features. Based on leave one out approach.

Parameters:
verbose: int

integer indicating verbosity of output. 0 for none.

cols: list

a list of columns to encode, if None, all string columns will be encoded

drop_invariant: bool

boolean for whether or not to drop columns with 0 variance

return_df: bool

boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array)

impute_missing: bool

boolean for whether or not to apply the logic for handle_unknown, will be deprecated in the future.

handle_unknown: str

options are ‘error’, ‘ignore’ and ‘impute’, defaults to ‘impute’, which will impute the category -1. Warning: if impute is used, an extra column will be added in if the transform matrix has unknown categories. This can causes unexpected changes in dimension in some cases.

min_samples_leaf : int

minimum samples to take category average into account

smoothing : int

smoothing effect to balance categorical average vs prior

References

[1]A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems. from

https://kaggle2.blob.core.windows.net/forum-message-attachments/225952/7441/high%20cardinality%20categoricals.pdf.

Methods

fit(X, y, **kwargs) Fit encoder according to X and y.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X[, y]) Perform the transformation to new categorical data.
target_encode  
fit(X, y, **kwargs)[source]

Fit encoder according to X and y. Parameters ———- X : array-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]
Target values.
self : encoder
Returns self.
transform(X, y=None)[source]

Perform the transformation to new categorical data. Parameters ———- X : array-like, shape = [n_samples, n_features] y : array-like, shape = [n_samples] when transform by leave one out

None, when transform withour target infor(such as transform test set)
Returns:
p : array, shape = [n_samples, n_numeric + N]

Transformed values with encoding applied.