Leave One Out¶

class
category_encoders.leave_one_out.
LeaveOneOutEncoder
(verbose=0, cols=None, drop_invariant=False, return_df=True, handle_unknown='value', handle_missing='value', random_state=None, sigma=None)[source]¶ Leave one out coding for categorical features.
This is very similar to target encoding but excludes the current row’s target when calculating the mean target for a level to reduce the effect of outliers.
Parameters:  verbose: int
integer indicating verbosity of the output. 0 for none.
 cols: list
a list of columns to encode, if None, all string columns will be encoded.
 drop_invariant: bool
boolean for whether or not to drop columns with 0 variance.
 return_df: bool
boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).
 handle_missing: str
options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.
 handle_unknown: str
options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.
 sigma: float
adds normal (Gaussian) distribution noise into training data in order to decrease overfitting (testing data are untouched). Sigma gives the standard deviation (spread or “width”) of the normal distribution. The optimal value is commonly between 0.05 and 0.6. The default is to not add noise, but that leads to significantly suboptimal results.
References
[R309474039f731] Strategies to encode categorical variables with many categories, from https://www.kaggle.com/c/caterpillartubepricing/discussion/15748#143154.
Methods
fit
(self, X, y, \*\*kwargs)Fit encoder according to X and y. fit_transform
(self, X[, y])Encoders that utilize the target must make sure that the training data are transformed with: get_feature_names
(self)Returns the names of all transformed / added columns. get_params
(self[, deep])Get parameters for this estimator. set_params
(self, \*\*params)Set the parameters of this estimator. transform
(self, X[, y, override_return_df])Perform the transformation to new categorical data. transform_leave_one_out
(self, X_in, y[, mapping])Leave one out encoding uses a single column of floats to represent the means of the target variables. fit_column_map fit_leave_one_out 
fit
(self, X, y, **kwargs)[source]¶ Fit encoder according to X and y.
Parameters:  X : arraylike, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features.
 y : arraylike, shape = [n_samples]
Target values.
Returns:  self : encoder
Returns self.

fit_transform
(self, X, y=None, **fit_params)[source]¶  Encoders that utilize the target must make sure that the training data are transformed with:
 transform(X, y)
 and not with:
 transform(X)

get_feature_names
(self)[source]¶ Returns the names of all transformed / added columns.
Returns:  feature_names: list
A list with all feature names transformed or added. Note: potentially dropped features are not included!

transform
(self, X, y=None, override_return_df=False)[source]¶ Perform the transformation to new categorical data.
Parameters:  X : arraylike, shape = [n_samples, n_features]
 y : arraylike, shape = [n_samples] when transform by leave one out
None, when transform without target information (such as transform test set)
Returns:  p : array, shape = [n_samples, n_numeric + N]
Transformed values with encoding applied.