lm_polygraph.utils.api_with_uncertainty module

API model wrapper with uncertainty estimation, analogous to VLLMWithUncertainty.

Wraps any OpenAI-compatible API model with lm-polygraph uncertainty scoring. Supports both generation (delegated to the wrapped model) and standalone scoring of pre-extracted logprobs.

Usage:

from lm_polygraph.estimators import MeanTokenEntropy from lm_polygraph.stat_calculators import VLLMLogprobsExtractionCalculator, EntropyCalculator from lm_polygraph.utils import APIWithUncertainty

# Wrap an existing API model model_with_uncertainty = APIWithUncertainty(

model=blackbox_model, stat_calculators=[VLLMLogprobsExtractionCalculator(), EntropyCalculator()], estimator=MeanTokenEntropy(),

)

# Option 1: Generate with immediate scoring results = model_with_uncertainty.generate(chats, max_new_tokens=1024, n=8) # results[i][“uncertainty_score”], results[i][“token_ids”], etc.

# Option 2: Score pre-extracted logprobs separately uncertainty = model_with_uncertainty.score(token_ids, logprobs)

# Get pseudo-tokenizer for step boundary mapping tokenizer = model_with_uncertainty.get_tokenizer() tokenizer.set_context(token_ids, logprobs) text = tokenizer.decode(token_ids[0:5])

class lm_polygraph.utils.api_with_uncertainty.APILogprobData(logprob: float, token: str)[source]

Bases: object

Minimal logprob entry mirroring vLLM’s logprob format.

logprob: float
token: str
class lm_polygraph.utils.api_with_uncertainty.APIWithUncertainty(model=None, stat_calculators: List = None, estimator=None)[source]

Bases: object

Wraps an OpenAI-compatible API model with uncertainty estimation, analogous to VLLMWithUncertainty for vLLM models.

Delegates generation to the wrapped model and scores outputs using lm-polygraph stat calculators and estimators. Also supports standalone scoring of pre-extracted logprobs via score().

Args:
model: API model instance with generate_texts(chats, **kwargs) method

that returns results with “logprobs” in OpenAI API format. Can be None if only using score() for pre-extracted logprobs.

stat_calculators: List of lm-polygraph stat calculators

(e.g., [VLLMLogprobsExtractionCalculator(), EntropyCalculator()]).

estimator: lm-polygraph Estimator instance

(e.g., MeanTokenEntropy, Perplexity).

generate(chats: List[List[Dict[str, str]]], compute_uncertainty: bool = True, **kwargs) List[List[Dict]][source]

Generate completions with optional uncertainty scores.

Delegates to the wrapped model’s generate_texts(), converts logprobs to vLLM format, and optionally computes uncertainty scores.

Args:

chats: List of chat message lists. compute_uncertainty: If True, compute uncertainty for all outputs. **kwargs: Generation parameters passed to model.generate_texts()

(max_new_tokens, temperature, n, stop, etc.)

Returns:
List of lists of result dicts. Each result dict contains:
  • text: Generated text

  • logprobs: API-format logprobs

  • token_ids: Pseudo token IDs (vLLM-compatible)

  • vllm_logprobs: Logprobs in vLLM format

  • uncertainty_score: Float (if compute_uncertainty=True)

  • finish_reason: API finish reason

get_tokenizer()[source]

Return pseudo-tokenizer for step boundary mapping.

The returned tokenizer implements decode(token_ids) by looking up token text from logprob entries. Call tokenizer.set_context() with the full trajectory’s token_ids and logprobs before using decode().

score(token_ids: List[int], logprobs: List[Dict]) float[source]

Compute uncertainty score from token IDs and logprobs.

Can be used standalone on pre-extracted logprobs, or called internally by generate(). Mirrors VLLMWithUncertainty.score().

Args:

token_ids: Pseudo token IDs (from convert_api_logprobs). logprobs: Logprob dicts in vLLM-compatible format.

Returns:

Uncertainty score (float). Higher = more uncertain.

lm_polygraph.utils.api_with_uncertainty.convert_api_logprobs(api_logprobs: List[Dict]) tuple[source]

Convert OpenAI API logprobs to lm-polygraph/vLLM format.

API returns: [{token: str, logprob: float, top_logprobs: [{token, logprob}]}] lm-polygraph expects: (List[int], List[Dict[int -> obj_with_logprob_attr]])

Uses hash-based pseudo token IDs since API doesn’t provide real IDs.

Args:

api_logprobs: List of logprob entries from OpenAI API.

Returns:

Tuple of (pseudo_token_ids, vllm_format_logprobs).