Leave One Out

class category_encoders.leave_one_out.LeaveOneOutEncoder(verbose=0, cols=None, drop_invariant=False, return_df=True, handle_unknown='value', handle_missing='value', random_state=None, sigma=None)[source]

Leave one out coding for categorical features.

This is very similar to target encoding but excludes the current row’s target when calculating the mean target for a level to reduce the effect of outliers.

Parameters:
verbose: int

integer indicating verbosity of the output. 0 for none.

cols: list

a list of columns to encode, if None, all string columns will be encoded.

drop_invariant: bool

boolean for whether or not to drop columns with 0 variance.

return_df: bool

boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).

handle_missing: str

options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.

handle_unknown: str

options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.

sigma: float

adds normal (Gaussian) distribution noise into training data in order to decrease overfitting (testing data are untouched). Sigma gives the standard deviation (spread or “width”) of the normal distribution. The optimal value is commonly between 0.05 and 0.6. The default is to not add noise, but that leads to significantly suboptimal results.

References

[R309474039f73-1]Strategies to encode categorical variables with many categories, from

https://www.kaggle.com/c/caterpillar-tube-pricing/discussion/15748#143154.

Methods

fit(self, X, y, \*\*kwargs) Fit encoder according to X and y.
fit_transform(self, X[, y]) Encoders that utilize the target must make sure that the training data are transformed with:
get_feature_names(self) Returns the names of all transformed / added columns.
get_params(self[, deep]) Get parameters for this estimator.
set_params(self, \*\*params) Set the parameters of this estimator.
transform(self, X[, y, override_return_df]) Perform the transformation to new categorical data.
transform_leave_one_out(self, X_in, y[, mapping]) Leave one out encoding uses a single column of floats to represent the means of the target variables.
fit_column_map  
fit_leave_one_out  
fit(self, X, y, **kwargs)[source]

Fit encoder according to X and y.

Parameters:
X : array-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

Returns:
self : encoder

Returns self.

fit_transform(self, X, y=None, **fit_params)[source]
Encoders that utilize the target must make sure that the training data are transformed with:
transform(X, y)
and not with:
transform(X)
get_feature_names(self)[source]

Returns the names of all transformed / added columns.

Returns:
feature_names: list

A list with all feature names transformed or added. Note: potentially dropped features are not included!

transform(self, X, y=None, override_return_df=False)[source]

Perform the transformation to new categorical data.

Parameters:
X : array-like, shape = [n_samples, n_features]
y : array-like, shape = [n_samples] when transform by leave one out

None, when transform without target information (such as transform test set)

Returns:
p : array, shape = [n_samples, n_numeric + N]

Transformed values with encoding applied.

transform_leave_one_out(self, X_in, y, mapping=None)[source]

Leave one out encoding uses a single column of floats to represent the means of the target variables.