Library design
The uncertainty estimation functionality in the library resides on three main entities:
Model wrappers:
lm_polygraph/model_wrappersStat calculators:
lm_polygraph/stat_calculatorsUncertainty estimators:
lm_polygraph/estimators
This design aims to decompose the LLM inference, heavy calculations, and uncertainty estimation process for modularity, flexibility, and evaluation performance.
Model wrappers
Model wrappers aim to encapsulate the guts of LLM inference process and provide a standardized interface for stat calculators. At the moment, the library supports:
Wrapper for whitebox LLMs:
lm_polygraph.model_wrappers.WhiteboxModelandlm_polygraph.model_wrappers.WhiteboxModelBasicWrapper for whitebox LLMs inferenced via vLLM:
lm_polygraph.model_wrappers.WhiteboxModelvLLMWrapper for visual whitebox LLMs (image to text models):
lm_polygraph.model_wrappers.VisualWhiteboxModelWrapper for LLMs deployed as services in the cloud, such as ChatGPT, Claude, etc. These models can be blackbox (when they provide only text) and greybox (when they provide also logits):
lm_polygraph.model_wrappers.BlackboxModel
Different model types should be inferenced in different ways, so wrappers help to abstract the inference process. Note also that not all stat calculators and estimators are available for all model types. For example, blackbox models do not provide logits, so only sampling-based and verbalized estimators are available. vLLMs does not provide access to internal states of the model, so attention-based methods are not supported by them.
Stat calculators
Stat calculators perform heavy computations on top of the LLM. They control the LLM’s inference and postprocess its results. The reason behind that is because UQ methods require special output from the LLM or need to aggregate results of multiple inferences.
Usually, there is not just one stat calculator, but a chain of them. For example, for performing claim-level UQ, you need to infer an LLM with GreedyProbsCalculator and split the generated text into atomic claims using ClaimsExtractor. During benchmarking the results of the stat calculators could be consumed by many different uncertainty estimators, hence saving time for repetitive calculations.
Due to differences in inference procedures for different model types, stat calculators are not universally compatible with all model wrappers. To determine what LLM types are supported by a stat calculator, you can look at the type of the model argument in the __call__ method. The most general stat calculator has the type Model. For example, the type of the argument for GreedyProbsVisualCalculator is VisualWhiteboxModel.
Estimators
Estimators are the final step in the uncertainty estimation process. They take the results of the stat calculators and aggregate them into the uncertainty score. The majority of estimators are computationally light, because in benchmarking the results of heavy computations should be leveraged by multiple uncertainty estimators for efficiency.
Automatic resolution of stat calculators for estimators
UEManager is used for automatic resolution of stat calculators for estimators. This is crucial for
High-level API represented by the
estimate_uncertaintyfunction.Benchmarking process represented by the
polygraph_evalscript.
Default configuration
estimate_uncertainty and option stat_calculators: auto in polygraph_eval leverage the configuration of stat_calculators and estimators specified in the lm_polygraph/defaults directory.