Target Encoder

class category_encoders.target_encoder.TargetEncoder(verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', min_samples_leaf=1, smoothing=1.0)[source]

Target encoding for categorical features.

Supported targets: binomial and continuous. For polynomial target support, see PolynomialWrapper.

For the case of categorical target: features are replaced with a blend of posterior probability of the target given particular categorical value and the prior probability of the target over all the training data.

For the case of continuous target: features are replaced with a blend of the expected value of the target given particular categorical value and the expected value of the target over all the training data.

Parameters
verbose: int

integer indicating verbosity of the output. 0 for none.

cols: list

a list of columns to encode, if None, all string columns will be encoded.

drop_invariant: bool

boolean for whether or not to drop columns with 0 variance.

return_df: bool

boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).

handle_missing: str

options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.

handle_unknown: str

options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.

min_samples_leaf: int

minimum samples to take category average into account.

smoothing: float

smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0.

References

1

A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems, from

https://dl.acm.org/citation.cfm?id=507538

Methods

fit(X, y, **kwargs)

Fit encoder according to X and y.

fit_transform(X[, y])

Encoders that utilize the target must make sure that the training data are transformed with:

get_feature_names()

Returns the names of all transformed / added columns.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

transform(X[, y, override_return_df])

Perform the transformation to new categorical data.

fit_target_encoding

target_encode

fit(X, y, **kwargs)[source]

Fit encoder according to X and y.

Parameters
Xarray-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

yarray-like, shape = [n_samples]

Target values.

Returns
selfencoder

Returns self.

get_feature_names()[source]

Returns the names of all transformed / added columns.

Returns
feature_names: list

A list with all feature names transformed or added. Note: potentially dropped features are not included!

transform(X, y=None, override_return_df=False)[source]

Perform the transformation to new categorical data.

Parameters
Xarray-like, shape = [n_samples, n_features]
yarray-like, shape = [n_samples] when transform by leave one out

None, when transform without target info (such as transform test set)

Returns
parray, shape = [n_samples, n_numeric + N]

Transformed values with encoding applied.