Target Encoder¶

class
category_encoders.target_encoder.
TargetEncoder
(verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', min_samples_leaf=1, smoothing=1.0)[source]¶ Target encoding for categorical features.
For the case of categorical target: features are replaced with a blend of posterior probability of the target given particular categorical value and the prior probability of the target over all the training data.
For the case of continuous target: features are replaced with a blend of the expected value of the target given particular categorical value and the expected value of the target over all the training data.
Parameters:  verbose: int
integer indicating verbosity of the output. 0 for none.
 cols: list
a list of columns to encode, if None, all string columns will be encoded.
 drop_invariant: bool
boolean for whether or not to drop columns with 0 variance.
 return_df: bool
boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).
 handle_missing: str
options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.
 handle_unknown: str
options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.
 min_samples_leaf: int
minimum samples to take category average into account.
 smoothing: float
smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0.
References
[R985f0065e3451] A Preprocessing Scheme for HighCardinality Categorical Attributes in Classification and Prediction Problems, from https://dl.acm.org/citation.cfm?id=507538
Methods
fit
(self, X, y, \*\*kwargs)Fit encoder according to X and y. fit_transform
(self, X[, y])Encoders that utilize the target must make sure that the training data are transformed with: get_feature_names
(self)Returns the names of all transformed / added columns. get_params
(self[, deep])Get parameters for this estimator. set_params
(self, \*\*params)Set the parameters of this estimator. transform
(self, X[, y, override_return_df])Perform the transformation to new categorical data. fit_target_encoding target_encode 
fit
(self, X, y, **kwargs)[source]¶ Fit encoder according to X and y.
Parameters:  X : arraylike, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features.
 y : arraylike, shape = [n_samples]
Target values.
Returns:  self : encoder
Returns self.

fit_transform
(self, X, y=None, **fit_params)[source]¶  Encoders that utilize the target must make sure that the training data are transformed with:
 transform(X, y)
 and not with:
 transform(X)

get_feature_names
(self)[source]¶ Returns the names of all transformed / added columns.
Returns:  feature_names: list
A list with all feature names transformed or added. Note: potentially dropped features are not included!

transform
(self, X, y=None, override_return_df=False)[source]¶ Perform the transformation to new categorical data.
Parameters:  X : arraylike, shape = [n_samples, n_features]
 y : arraylike, shape = [n_samples] when transform by leave one out
None, when transform without target info (such as transform test set)
Returns:  p : array, shape = [n_samples, n_numeric + N]
Transformed values with encoding applied.