Timing comparison with scikit-learn for Lasso

Compare time to solve large scale Lasso problems with scikit-learn.

lasso, enet
file_sizes:   0%|                                   | 0.00/26.8M [00:00<?, ?B/s]
file_sizes:   0%|                           | 24.6k/26.8M [00:00<03:29, 127kB/s]
file_sizes:   0%|                           | 49.2k/26.8M [00:00<03:30, 127kB/s]
file_sizes:   0%|                            | 106k/26.8M [00:00<02:10, 204kB/s]
file_sizes:   1%|▏                           | 221k/26.8M [00:00<01:14, 357kB/s]
file_sizes:   2%|▍                           | 451k/26.8M [00:00<00:40, 653kB/s]
file_sizes:   3%|▉                          | 909k/26.8M [00:01<00:20, 1.23MB/s]
file_sizes:   5%|█▎                        | 1.30M/26.8M [00:01<00:17, 1.48MB/s]
file_sizes:  12%|███▏                      | 3.27M/26.8M [00:01<00:07, 3.21MB/s]
file_sizes:  16%|████▏                     | 4.32M/26.8M [00:01<00:05, 3.75MB/s]
file_sizes:  19%|████▉                     | 5.10M/26.8M [00:02<00:07, 3.02MB/s]
file_sizes:  26%|██████▋                   | 6.94M/26.8M [00:02<00:05, 3.66MB/s]
file_sizes:  28%|███████▏                  | 7.46M/26.8M [00:02<00:05, 3.46MB/s]
file_sizes:  31%|████████                  | 8.25M/26.8M [00:03<00:05, 3.58MB/s]
file_sizes:  32%|████████▍                 | 8.64M/26.8M [00:03<00:05, 3.19MB/s]
file_sizes:  35%|█████████▏                | 9.43M/26.8M [00:03<00:06, 2.71MB/s]
file_sizes:  38%|█████████▉                | 10.2M/26.8M [00:04<00:06, 2.46MB/s]
file_sizes:  39%|██████████▏               | 10.5M/26.8M [00:04<00:07, 2.23MB/s]
file_sizes:  41%|██████████▌               | 10.9M/26.8M [00:04<00:07, 2.18MB/s]
file_sizes:  42%|██████████▉               | 11.3M/26.8M [00:04<00:07, 2.14MB/s]
file_sizes:  44%|███████████▎              | 11.7M/26.8M [00:05<00:09, 1.67MB/s]
file_sizes:  45%|███████████▋              | 12.1M/26.8M [00:05<00:08, 1.76MB/s]
file_sizes:  46%|███████████▉              | 12.3M/26.8M [00:05<00:08, 1.66MB/s]
file_sizes:  47%|████████████▏             | 12.5M/26.8M [00:05<00:09, 1.49MB/s]
file_sizes:  48%|████████████▍             | 12.8M/26.8M [00:05<00:09, 1.54MB/s]
file_sizes:  49%|████████████▋             | 13.0M/26.8M [00:06<00:09, 1.40MB/s]
file_sizes:  50%|████████████▉             | 13.4M/26.8M [00:06<00:09, 1.48MB/s]
file_sizes:  51%|█████████████▏            | 13.6M/26.8M [00:06<00:09, 1.35MB/s]
file_sizes:  52%|█████████████▍            | 13.9M/26.8M [00:06<00:08, 1.44MB/s]
file_sizes:  53%|█████████████▊            | 14.2M/26.8M [00:06<00:08, 1.51MB/s]
file_sizes:  54%|█████████████▉            | 14.4M/26.8M [00:06<00:09, 1.37MB/s]
file_sizes:  55%|██████████████▎           | 14.7M/26.8M [00:07<00:08, 1.46MB/s]
file_sizes:  57%|██████████████▊           | 15.2M/26.8M [00:07<00:06, 1.71MB/s]
file_sizes:  57%|██████████████▉           | 15.4M/26.8M [00:07<00:07, 1.51MB/s]
file_sizes:  58%|███████████████▏          | 15.6M/26.8M [00:07<00:08, 1.37MB/s]
file_sizes:  59%|███████████████▎          | 15.8M/26.8M [00:07<00:08, 1.25MB/s]
file_sizes:  61%|███████████████▊          | 16.2M/26.8M [00:08<00:08, 1.23MB/s]
file_sizes:  62%|████████████████          | 16.5M/26.8M [00:08<00:08, 1.25MB/s]
file_sizes:  62%|████████████████▏         | 16.7M/26.8M [00:08<00:08, 1.19MB/s]
file_sizes:  63%|████████████████▍         | 16.9M/26.8M [00:08<00:08, 1.14MB/s]
file_sizes:  64%|████████████████▌         | 17.1M/26.8M [00:09<00:08, 1.13MB/s]
file_sizes:  65%|████████████████▊         | 17.4M/26.8M [00:09<00:08, 1.17MB/s]
file_sizes:  66%|█████████████████         | 17.6M/26.8M [00:09<00:08, 1.14MB/s]
file_sizes:  67%|█████████████████▎        | 17.8M/26.8M [00:09<00:07, 1.18MB/s]
file_sizes:  68%|█████████████████▌        | 18.1M/26.8M [00:09<00:06, 1.32MB/s]
file_sizes:  68%|█████████████████▊        | 18.3M/26.8M [00:10<00:06, 1.23MB/s]
file_sizes:  69%|█████████████████▉        | 18.5M/26.8M [00:10<00:07, 1.17MB/s]
file_sizes:  70%|██████████████████▎       | 18.9M/26.8M [00:10<00:05, 1.32MB/s]
file_sizes:  71%|██████████████████▍       | 19.0M/26.8M [00:10<00:06, 1.18MB/s]
file_sizes:  72%|██████████████████▋       | 19.3M/26.8M [00:10<00:06, 1.20MB/s]
file_sizes:  73%|██████████████████▉       | 19.5M/26.8M [00:11<00:05, 1.22MB/s]
file_sizes:  74%|███████████████████▎      | 19.8M/26.8M [00:11<00:05, 1.35MB/s]
file_sizes:  75%|███████████████████▍      | 20.0M/26.8M [00:11<00:05, 1.21MB/s]
file_sizes:  76%|███████████████████▋      | 20.2M/26.8M [00:11<00:05, 1.22MB/s]
file_sizes:  77%|███████████████████▉      | 20.6M/26.8M [00:11<00:04, 1.33MB/s]
file_sizes:  78%|████████████████████▏     | 20.8M/26.8M [00:12<00:04, 1.24MB/s]
file_sizes:  78%|████████████████████▎     | 21.0M/26.8M [00:12<00:04, 1.32MB/s]
file_sizes:  79%|█████████████████████▍     | 21.2M/26.8M [00:12<00:06, 918kB/s]
file_sizes:  81%|█████████████████████     | 21.7M/26.8M [00:12<00:04, 1.18MB/s]
file_sizes:  82%|█████████████████████▏    | 21.9M/26.8M [00:13<00:04, 1.00MB/s]
file_sizes:  83%|█████████████████████▍    | 22.1M/26.8M [00:13<00:04, 1.05MB/s]
file_sizes:  83%|█████████████████████▋    | 22.3M/26.8M [00:13<00:04, 1.04MB/s]
file_sizes:  84%|█████████████████████▉    | 22.5M/26.8M [00:13<00:04, 1.03MB/s]
file_sizes:  85%|██████████████████████    | 22.7M/26.8M [00:14<00:03, 1.03MB/s]
file_sizes:  86%|██████████████████████▎   | 22.9M/26.8M [00:14<00:03, 1.02MB/s]
file_sizes:  86%|██████████████████████▍   | 23.1M/26.8M [00:14<00:03, 1.06MB/s]
file_sizes:  87%|██████████████████████▋   | 23.4M/26.8M [00:14<00:03, 1.10MB/s]
file_sizes:  88%|██████████████████████▉   | 23.6M/26.8M [00:14<00:03, 1.06MB/s]
file_sizes:  89%|███████████████████████   | 23.8M/26.8M [00:14<00:02, 1.06MB/s]
file_sizes:  90%|███████████████████████▎  | 24.0M/26.8M [00:15<00:02, 1.04MB/s]
file_sizes:  90%|████████████████████████▎  | 24.1M/26.8M [00:15<00:02, 986kB/s]
file_sizes:  91%|███████████████████████▋  | 24.4M/26.8M [00:15<00:02, 1.03MB/s]
file_sizes:  91%|████████████████████████▋  | 24.5M/26.8M [00:15<00:02, 934kB/s]
file_sizes:  92%|████████████████████████▉  | 24.7M/26.8M [00:15<00:02, 998kB/s]
file_sizes:  93%|████████████████████████▏ | 24.9M/26.8M [00:16<00:01, 1.00MB/s]
file_sizes:  94%|█████████████████████████▎ | 25.1M/26.8M [00:16<00:01, 908kB/s]
file_sizes:  94%|█████████████████████████▍ | 25.3M/26.8M [00:16<00:01, 939kB/s]
file_sizes:  95%|█████████████████████████▋ | 25.5M/26.8M [00:16<00:01, 961kB/s]
file_sizes:  96%|█████████████████████████▊ | 25.6M/26.8M [00:16<00:01, 971kB/s]
file_sizes:  97%|██████████████████████████ | 25.8M/26.8M [00:17<00:01, 928kB/s]
file_sizes:  97%|██████████████████████████▏| 26.0M/26.8M [00:17<00:01, 799kB/s]
file_sizes:  98%|██████████████████████████▎| 26.1M/26.8M [00:17<00:00, 901kB/s]
file_sizes:  98%|██████████████████████████▌| 26.3M/26.8M [00:17<00:00, 948kB/s]
file_sizes:  99%|██████████████████████████▋| 26.4M/26.8M [00:17<00:00, 813kB/s]
file_sizes:  99%|██████████████████████████▊| 26.6M/26.8M [00:18<00:00, 742kB/s]
file_sizes: 100%|███████████████████████████| 26.8M/26.8M [00:18<00:00, 592kB/s]
file_sizes: 100%|██████████████████████████| 26.8M/26.8M [00:18<00:00, 1.44MB/s]

import time
import warnings
import numpy as np
from numpy.linalg import norm
import matplotlib.pyplot as plt
from libsvmdata import fetch_libsvm

from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import Lasso as Lasso_sklearn
from sklearn.linear_model import ElasticNet as Enet_sklearn

from skglm import Lasso, ElasticNet

warnings.filterwarnings('ignore', category=ConvergenceWarning)


def compute_obj(X, y, w, alpha, l1_ratio=1):
    loss = norm(y - X @ w) ** 2 / (2 * len(y))
    penalty = (alpha * l1_ratio * np.sum(np.abs(w))
               + 0.5 * alpha * (1 - l1_ratio) * norm(w) ** 2)
    return loss + penalty


X, y = fetch_libsvm("news20.binary"
                    )
alpha = np.max(np.abs(X.T @ y)) / len(y) / 10

dict_sklearn = {}
dict_sklearn["lasso"] = Lasso_sklearn(
    alpha=alpha, fit_intercept=False, tol=1e-12)

dict_sklearn["enet"] = Enet_sklearn(
    alpha=alpha, fit_intercept=False, tol=1e-12, l1_ratio=0.5)

dict_ours = {}
dict_ours["lasso"] = Lasso(
    alpha=alpha, fit_intercept=False, tol=1e-12)
dict_ours["enet"] = ElasticNet(
    alpha=alpha, fit_intercept=False, tol=1e-12, l1_ratio=0.5)

models = ["lasso", "enet"]

fig, axarr = plt.subplots(2, 1, constrained_layout=True)

for ax, model, l1_ratio in zip(axarr, models, [1, 0.5]):
    pobj_dict = {}
    pobj_dict["sklearn"] = list()
    pobj_dict["us"] = list()

    time_dict = {}
    time_dict["sklearn"] = list()
    time_dict["us"] = list()

    # Remove compilation time
    dict_ours[model].max_iter = 10_000
    w_star = dict_ours[model].fit(X, y).coef_
    pobj_star = compute_obj(X, y, w_star, alpha, l1_ratio)
    for n_iter_sklearn in np.unique(np.geomspace(1, 50, num=15).astype(int)):
        dict_sklearn[model].max_iter = n_iter_sklearn

        t_start = time.time()
        w_sklearn = dict_sklearn[model].fit(X, y).coef_
        time_dict["sklearn"].append(time.time() - t_start)
        pobj_dict["sklearn"].append(compute_obj(X, y, w_sklearn, alpha, l1_ratio))

    for n_iter_us in range(1, 10):
        dict_ours[model].max_iter = n_iter_us
        t_start = time.time()
        w = dict_ours[model].fit(X, y).coef_
        time_dict["us"].append(time.time() - t_start)
        pobj_dict["us"].append(compute_obj(X, y, w, alpha, l1_ratio))

    ax.semilogy(
        time_dict["sklearn"], pobj_dict["sklearn"] - pobj_star, label='sklearn')
    ax.semilogy(
        time_dict["us"], pobj_dict["us"] - pobj_star, label='skglm')

    ax.set_ylim((1e-10, 1))
    ax.set_title(model)
    ax.legend()
    ax.set_ylabel("Objective suboptimality")

axarr[1].set_xlabel("Time (s)")
plt.show(block=False)

Total running time of the script: (1 minutes 1.094 seconds)

Gallery generated by Sphinx-Gallery