Skip to content

How to Control LLM Risks?

Risk control techniques in binary classification can be used to enhance LLM trustworthiness by implementing response guardrails, such as censoring undesired content.


LLM as Binary Classifier

Conformal prediction methods can be applied to LLM-based classifiers. We propose a method presented in Benchmarking LLMs via Uncertainty Quantification, which:

  1. Reduces a commonsense reasoning task (CosmosQA dataset) to a classification problem
  2. Extracts only the logits corresponding to the possible answers
  3. Applies a softmax so the LLM can be used as a simple classifier
  4. Enables the use of conformal predictions

Resources

Educational Repository

The following repository (not maintained by the MAPIE team) implements part of this paper for educational purposes in the MAPIE_for_cosmosqa notebook.

Blog Article

Read our blog article on Medium where we dive deeper into the topic.