Category Encoders
A set of scikit-learn-style transformers for encoding categorical variables into numeric with different techniques. While ordinal, one-hot, and hashing encoders have similar equivalents in the existing scikit-learn version, the transformers in this library all share a few useful properties:
First-class support for pandas dataframes as an input (and optionally as output)
Can explicitly configure which columns in the data are encoded by name or index, or infer non-numeric columns regardless of input type
Can drop any columns with very low variance based on training set optionally
Portability: train a transformer on data, pickle it, reuse it later and get the same thing out.
Full compatibility with sklearn pipelines, input an array-like dataset like any other transformer
Usage
install as:
pip install category_encoders
or
conda install -c conda-forge category_encoders
To use:
import category_encoders as ce
encoder = ce.BackwardDifferenceEncoder(cols=[...])
encoder = ce.BaseNEncoder(cols=[...])
encoder = ce.BinaryEncoder(cols=[...])
encoder = ce.CatBoostEncoder(cols=[...])
encoder = ce.CountEncoder(cols=[...])
encoder = ce.GLMMEncoder(cols=[...])
encoder = ce.GrayEncoder(cols=[...])
encoder = ce.HashingEncoder(cols=[...])
encoder = ce.HelmertEncoder(cols=[...])
encoder = ce.JamesSteinEncoder(cols=[...])
encoder = ce.LeaveOneOutEncoder(cols=[...])
encoder = ce.MEstimateEncoder(cols=[...])
encoder = ce.OneHotEncoder(cols=[...])
encoder = ce.OrdinalEncoder(cols=[...])
encoder = ce.SumEncoder(cols=[...])
encoder = ce.PolynomialEncoder(cols=[...])
encoder = ce.TargetEncoder(cols=[...])
encoder = ce.WOEEncoder(cols=[...])
encoder = ce.QuantileEncoder(cols=[...])
encoder.fit(X, y)
X_cleaned = encoder.transform(X_dirty)
All of these are fully compatible sklearn transformers, so they can be used in pipelines or in your existing scripts. If the cols parameter isn’t passed, every non-numeric column will be converted. See below for detailed documentation
Contents:
- Backward Difference Coding
BackwardDifferenceEncoder
BackwardDifferenceEncoder.fit()
BackwardDifferenceEncoder.fit_transform()
BackwardDifferenceEncoder.get_feature_names_in()
BackwardDifferenceEncoder.get_feature_names_out()
BackwardDifferenceEncoder.get_params()
BackwardDifferenceEncoder.set_output()
BackwardDifferenceEncoder.set_params()
BackwardDifferenceEncoder.transform()
- BaseN
BaseNEncoder
BaseNEncoder.basen_encode()
BaseNEncoder.basen_to_integer()
BaseNEncoder.col_transform()
BaseNEncoder.fit()
BaseNEncoder.fit_transform()
BaseNEncoder.get_feature_names_in()
BaseNEncoder.get_feature_names_out()
BaseNEncoder.get_params()
BaseNEncoder.inverse_transform()
BaseNEncoder.set_output()
BaseNEncoder.set_params()
BaseNEncoder.transform()
- Binary
BinaryEncoder
BinaryEncoder.basen_encode()
BinaryEncoder.basen_to_integer()
BinaryEncoder.col_transform()
BinaryEncoder.fit()
BinaryEncoder.fit_transform()
BinaryEncoder.get_feature_names_in()
BinaryEncoder.get_feature_names_out()
BinaryEncoder.get_params()
BinaryEncoder.inverse_transform()
BinaryEncoder.set_output()
BinaryEncoder.set_params()
BinaryEncoder.transform()
- CatBoost Encoder
- Count Encoder
- Generalized Linear Mixed Model Encoder
- Gray
- Hashing
- Helmert Coding
- James-Stein Encoder
- Leave One Out
LeaveOneOutEncoder
LeaveOneOutEncoder.fit()
LeaveOneOutEncoder.fit_transform()
LeaveOneOutEncoder.get_feature_names_in()
LeaveOneOutEncoder.get_feature_names_out()
LeaveOneOutEncoder.get_params()
LeaveOneOutEncoder.set_output()
LeaveOneOutEncoder.set_params()
LeaveOneOutEncoder.transform()
LeaveOneOutEncoder.transform_leave_one_out()
- M-estimate
- One Hot
OneHotEncoder
OneHotEncoder.fit()
OneHotEncoder.fit_transform()
OneHotEncoder.get_dummies()
OneHotEncoder.get_feature_names_in()
OneHotEncoder.get_feature_names_out()
OneHotEncoder.get_params()
OneHotEncoder.inverse_transform()
OneHotEncoder.reverse_dummies()
OneHotEncoder.set_output()
OneHotEncoder.set_params()
OneHotEncoder.transform()
- Ordinal
OrdinalEncoder
OrdinalEncoder.fit()
OrdinalEncoder.fit_transform()
OrdinalEncoder.get_feature_names_in()
OrdinalEncoder.get_feature_names_out()
OrdinalEncoder.get_params()
OrdinalEncoder.inverse_transform()
OrdinalEncoder.ordinal_encoding()
OrdinalEncoder.set_output()
OrdinalEncoder.set_params()
OrdinalEncoder.transform()
- Polynomial Coding
- Quantile Encoder
- Sum Coding
- Summary Encoder
- Target Encoder
- Weight of Evidence
- Wrappers