How to Control LLM Risks?¶

Risk control techniques in binary classification can be used to enhance LLM trustworthiness by implementing response guardrails, such as censoring undesired content.

LLM as Binary Classifier¶

Conformal prediction methods can be applied to LLM-based classifiers. We propose a method presented in Benchmarking LLMs via Uncertainty Quantification, which:

Reduces a commonsense reasoning task (CosmosQA dataset) to a classification problem
Extracts only the logits corresponding to the possible answers
Applies a softmax so the LLM can be used as a simple classifier
Enables the use of conformal predictions

Resources¶

Educational Repository

The following repository (not maintained by the MAPIE team) implements part of this paper for educational purposes in the MAPIE_for_cosmosqa notebook.

Blog Article

Read our blog article on Medium where we dive deeper into the topic.