Introduction¶
lightning is composed of three modules: classification, regression and ranking. Several solvers are available from each.
If you’re not sure what solver to use, just go for classification.CDClassifier
/
regression.CDRegressor
or classification.SDCAClassifier
/ regression.SDCARegressor
. They
are very fast and do not require any tedious tuning of a learning rate.
Primal coordinate descent¶
classification.CDClassifier
, regression.CDRegressor
Main idea: update a single coordinate at a time (closed-form update when possible, coordinate-wise gradient descent otherwise)
Non-smooth losses: No
Penalties: L2, L1, L1/L2
Learning rate: No
Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge
Dual coordinate ascent¶
classification.LinearSVC
, regression.LinearSVR
(L2-regularization, supports shrinking)
classification.SDCAClassifier
, regression.SDCARegressor
(Elastic-net, supports many losses)
Main idea: update a single dual coordinate at a time (closed-form solution available for many loss functions)
Non-smooth losses: Yes
Penalties: L2, Elastic-net
Learning rate: No
Multiclass: one-vs-rest
FISTA¶
classification.FistaClassifier
, regression.FistaRegressor
Main idea: accelerated proximal gradient method (uses full gradients)
Non-smooth losses: No
Penalties: L1, L1/L2, Trace/Nuclear
Learning rate: No
Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge
Stochastic gradient method (SGD)¶
classification.SGDClassifier
, regression.SGDRegressor
Main idea: replace full gradient with stochastic estimate obtained from a single sample
Non-smooth losses: Yes
Penalties: L2, L1, L1/L2
Learning rate: Yes (very sensitive)
Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge
AdaGrad¶
classification.AdaGradClassifier
, regression.AdaGradRegressor
Main idea: use per-feature learning rates (frequently occurring features in the gradients get small learning rates and infrequent features get higher ones)
Non-smooth losses: Yes
Penalties: L2, Elastic-net
Learning rate: Yes (not very sensitive)
Multiclass: one-vs-rest
Stochastic averaged gradient (SAG and SAGA)¶
classification.SAGClassifier
, classification.SAGAClassifier
, regression.SAGRegressor
, regression.SAGARegressor
Main idea: instead of using the full gradient (average of sample-wise gradients), compute gradient for a randomly selected sample and use out-dated gradients for other samples
Non-smooth losses: Yes (
classification.SAGAClassifier
andregression.SAGARegressor
)Penalties: L1, L2, Elastic-net
Learning rate: Yes (not very sensitive)
Multiclass: one-vs-rest
Stochastic variance-reduced gradient (SVRG)¶
classification.SVRGClassifier
, regression.SVRGRegressor
Main idea: compute full gradient periodically and use it to center the gradient estimate (this can be shown to reduce the variance)
Non-smooth losses: No
Penalties: L2
Learning rate: Yes (not very sensitive)
Multiclass: one-vs-rest
PRank¶
ranking.PRank
, ranking.KernelPRank
Main idea: Perceptron-like algorithm for ordinal regression
Penalties: L2
Learning rate: No