Classification — Theoretical Description¶
Terminology
In theoretical parts of the documentation:
alphais equivalent to1 - confidence_level— it can be seen as a risk level.- calibrate and calibration are equivalent to conformalize and conformalization.
Three methods for multi-class uncertainty quantification have been implemented in MAPIE: LAC (Least Ambiguous set-valued Classifier) 1, APS (Adaptive Prediction Sets) 2 3, and Top-K 3.
Mathematical Setting¶
For a classification problem in a standard i.i.d. case, our training data \((X, Y) = \{(x_1, y_1), \ldots, (x_n, y_n)\}\) has an unknown distribution \(P_{X, Y}\).
For any risk level \(\alpha \in (0, 1)\), the methods allow constructing a prediction set \(\hat{C}_{n, \alpha}(X_{n+1})\) with a marginal coverage guarantee:
For a typical \(\alpha = 10\%\), we construct prediction sets that contain the true observations for at least 90% of new test data points.
Info
The guarantee applies only to marginal coverage, not conditional coverage \(P \{Y_{n+1} \in \hat{C}_{n, \alpha}(X_{n+1}) | X_{n+1} = x_{n+1} \}\) which depends on the location of the test point.
1. LAC¶
In the LAC method, the conformity score is one minus the score of the true label:
The quantile \(\hat{q}\) is computed as:
The prediction set includes all labels with score higher than the threshold:
Warning
Although LAC generally results in small prediction sets, it tends to produce empty sets when the model is uncertain (e.g., at the border between two classes).
2. Top-K¶
Introduced in 3, the Top-K method gives the same prediction set size for all observations. The conformity score is the rank of the true label:
3. Adaptive Prediction Sets (APS)¶
The APS method overcomes LAC's empty set problem by constructing non-empty prediction sets. Conformity scores are computed by summing ranked scores until reaching the true label:
Prediction sets are built similarly:
By default, the label whose cumulative score exceeds the quantile is included. Its incorporation can also be randomized for tighter effective coverage 2 3.
4. Regularized Adaptive Prediction Sets (RAPS)¶
RAPS 3 improves APS by regularizing to avoid very large prediction sets:
Where:
- \((z)^+\) denotes the positive part of \(z\)
- \(k_{\text{reg}}\) is the optimal set size (determined by the Top-K method on a held-out split)
- \(\lambda\) is a regularization parameter (grid search over \(\{0.001, 0.01, 0.1, 0.2, 0.5\}\))
Prediction set construction:
Exact Coverage via Randomization¶
To achieve exact coverage, randomization on the last label can be applied:
- Define \(V_i = \frac{s_i(X_i, Y_i) - \hat{q}_{1-\alpha}}{\hat{\mu}(X_i)_{\pi_k} + \lambda \mathbb{1}(k > k_{\text{reg}})}\)
- Compare each \(V_i\) to \(U \sim \text{Unif}(0, 1)\)
- If \(V_i \leq U\), the last included label is removed.
5. Split- and Cross-Conformal Strategies¶
MAPIE includes both split- and cross-conformal strategies for LAC and APS, but only split-conformal for Top-K.
The cross-conformal implementation follows Algorithm 2 of 2:
- Split training into \(K\) disjoint subsets.
- Fit \(K\) classification functions \(\hat{\mu}_{-S_k}\).
- Compute out-of-fold conformity scores.
- For new test points, compare conformity scores to decide label inclusion.
For APS (see eq. 11 of 2):
References¶
-
Sadinle, Mauricio, Jing Lei, & Larry Wasserman. "Least Ambiguous Set-Valued Classifiers With Bounded Error Levels." JASA, 114:525, 223-234, 2019. ↩
-
Romano, Yaniv, Matteo Sesia and Emmanuel J. Candès. "Classification with Valid and Adaptive Coverage." NeurIPS 2020 (spotlight). ↩↩↩↩
-
Angelopoulos, Anastasios N., Stephen Bates, Michael Jordan and Jitendra Malik. "Uncertainty Sets for Image Classifiers using Conformal Prediction." ICLR 2021. ↩↩↩↩↩