Introduction

lightning is composed of three modules: classification, regression and ranking. Several solvers are available from each.

If you’re not sure what solver to use, just go for classification.CDClassifier / regression.CDRegressor or classification.SDCAClassifier / regression.SDCARegressor. They are very fast and do not require any tedious tuning of a learning rate.

Primal coordinate descent

classification.CDClassifier, regression.CDRegressor

  • Main idea: update a single coordinate at a time (closed-form update when possible, coordinate-wise gradient descent otherwise)

  • Non-smooth losses: No

  • Penalties: L2, L1, L1/L2

  • Learning rate: No

  • Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge

Dual coordinate ascent

classification.LinearSVC, regression.LinearSVR (L2-regularization, supports shrinking)

classification.SDCAClassifier, regression.SDCARegressor (Elastic-net, supports many losses)

  • Main idea: update a single dual coordinate at a time (closed-form solution available for many loss functions)

  • Non-smooth losses: Yes

  • Penalties: L2, Elastic-net

  • Learning rate: No

  • Multiclass: one-vs-rest

FISTA

classification.FistaClassifier, regression.FistaRegressor

  • Main idea: accelerated proximal gradient method (uses full gradients)

  • Non-smooth losses: No

  • Penalties: L1, L1/L2, Trace/Nuclear

  • Learning rate: No

  • Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge

Stochastic gradient method (SGD)

classification.SGDClassifier, regression.SGDRegressor

  • Main idea: replace full gradient with stochastic estimate obtained from a single sample

  • Non-smooth losses: Yes

  • Penalties: L2, L1, L1/L2

  • Learning rate: Yes (very sensitive)

  • Multiclass: one-vs-rest, multiclass logistic, multiclass squared hinge

AdaGrad

classification.AdaGradClassifier, regression.AdaGradRegressor

  • Main idea: use per-feature learning rates (frequently occurring features in the gradients get small learning rates and infrequent features get higher ones)

  • Non-smooth losses: Yes

  • Penalties: L2, Elastic-net

  • Learning rate: Yes (not very sensitive)

  • Multiclass: one-vs-rest

Stochastic averaged gradient (SAG and SAGA)

classification.SAGClassifier, classification.SAGAClassifier, regression.SAGRegressor, regression.SAGARegressor

  • Main idea: instead of using the full gradient (average of sample-wise gradients), compute gradient for a randomly selected sample and use out-dated gradients for other samples

  • Non-smooth losses: Yes (classification.SAGAClassifier and regression.SAGARegressor)

  • Penalties: L1, L2, Elastic-net

  • Learning rate: Yes (not very sensitive)

  • Multiclass: one-vs-rest

Stochastic variance-reduced gradient (SVRG)

classification.SVRGClassifier, regression.SVRGRegressor

  • Main idea: compute full gradient periodically and use it to center the gradient estimate (this can be shown to reduce the variance)

  • Non-smooth losses: No

  • Penalties: L2

  • Learning rate: Yes (not very sensitive)

  • Multiclass: one-vs-rest

PRank

ranking.PRank, ranking.KernelPRank

  • Main idea: Perceptron-like algorithm for ordinal regression

  • Penalties: L2

  • Learning rate: No