Introduction¶

lightning is composed of three modules: classification, regression and ranking. Several solvers are available from each.

If you’re not sure what solver to use, just go for classification.CDClassifier / regression.CDRegressor or classification.SDCAClassifier / regression.SDCARegressor. They are very fast and do not require any tedious tuning of a learning rate.

Primal coordinate descent¶

classification.CDClassifier, regression.CDRegressor

Main idea: update a single coordinate at a time (closed-form update when possible, coordinate-wise gradient descent otherwise)
Non-smooth losses: No
Penalties: L2, L1, L1/L2
Learning rate: No
Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge

Dual coordinate ascent¶

classification.LinearSVC, regression.LinearSVR (L2-regularization, supports shrinking)

classification.SDCAClassifier, regression.SDCARegressor (Elastic-net, supports many losses)

Main idea: update a single dual coordinate at a time (closed-form solution available for many loss functions)
Non-smooth losses: Yes
Penalties: L2, Elastic-net
Learning rate: No
Multiclass: one-vs-rest

FISTA¶

classification.FistaClassifier, regression.FistaRegressor

Main idea: accelerated proximal gradient method (uses full gradients)
Non-smooth losses: No
Penalties: L1, L1/L2, Trace/Nuclear
Learning rate: No
Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge

Stochastic gradient method (SGD)¶

classification.SGDClassifier, regression.SGDRegressor

Main idea: replace full gradient with stochastic estimate obtained from a single sample
Non-smooth losses: Yes
Penalties: L2, L1, L1/L2
Learning rate: Yes (very sensitive)
Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge

AdaGrad¶

classification.AdaGradClassifier, regression.AdaGradRegressor

Main idea: use per-feature learning rates (frequently occurring features in the gradients get small learning rates and infrequent features get higher ones)
Non-smooth losses: Yes
Penalties: L2, Elastic-net
Learning rate: Yes (not very sensitive)
Multiclass: one-vs-rest

Stochastic averaged gradient (SAG and SAGA)¶

classification.SAGClassifier, classification.SAGAClassifier, regression.SAGRegressor, regression.SAGARegressor

Main idea: instead of using the full gradient (average of sample-wise gradients), compute gradient for a randomly selected sample and use out-dated gradients for other samples
Non-smooth losses: Yes (classification.SAGAClassifier and regression.SAGARegressor)
Penalties: L1, L2, Elastic-net
Learning rate: Yes (not very sensitive)
Multiclass: one-vs-rest

Stochastic variance-reduced gradient (SVRG)¶

classification.SVRGClassifier, regression.SVRGRegressor

Main idea: compute full gradient periodically and use it to center the gradient estimate (this can be shown to reduce the variance)
Non-smooth losses: No
Penalties: L2
Learning rate: Yes (not very sensitive)
Multiclass: one-vs-rest

PRank¶

ranking.PRank, ranking.KernelPRank

Main idea: Perceptron-like algorithm for ordinal regression
Penalties: L2
Learning rate: No