lm_polygraph.utils package

Subpackages

Submodules

lm_polygraph.utils.cir_model module

This module contains the CenteredIsotonicRegression class. Copied with minor modifications from https://github.com/mathijs02/cir-model/blob/main/src/cir_model/cir_model.py

class lm_polygraph.utils.cir_model.CenteredIsotonicRegression(non_centered_points: List[float | int] = [0, 1], **kwargs: Any)[source]

Bases: IsotonicRegression

Centered Isotonic Regression (CIR) model. CIR is described in [1] and is similar to Isotonic Regression (IR). CIR takes as an additional constraint, compared to IR, that the resulting function needs to be strictly monotonic: ranges of constant function values are prevented as much as possible. The CenteredIsotonicRegression class inherits all methods and attributes from the scikit-learn implementation IsotonicRegression and it is therefore compatible with the other components of the scikit-learn library, like for example pipelines.

Parameters

This class takes the same parameters and has the same attributes as IsotonicRegression from scikit-learn.[2]_ For full documentation of IsotonicRegression, see: https://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html

CenteredIsotonicRegression takes one additional parameter:

non_centered_pointslist, default: [0, 1]

A list of y values that should not be collapsed in the CIR algorithm. In the original CIR algorithm, y values of 0 and 1 are treated differently by not collapsing them. This is because CIR is typically used for a binary target variable. The default behaviour can be overruled by passing a list of values for non_centered_points. An empty list means that no points are treated differently.

References

Examples

>>> from cir_model import CenteredIsotonicRegression
>>> x = [1, 2, 3, 4]
>>> y = [1, 21, 41, 34]
>>> model = CenteredIsotonicRegression().fit(x, y)
>>> model.transform(x)
array([ 1. , 21. , 32. , 37.5])
fit(X: ndarray | List, y: ndarray | List, sample_weight: ndarray | List | None = None) CenteredIsotonicRegression[source]

Fit the model using X, y and optionally sample_weight as training data. This method takes the same parameters and returns the same objects as fit from IsotonicRegression. For full documentation of IsotonicRegression, see: https://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html#sklearn.isotonic.IsotonicRegression.fit

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns

selfobject

The updated object.

set_predict_request(*, T: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

Tstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for T parameter in predict.

Returns

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

set_transform_request(*, T: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

Tstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for T parameter in transform.

Returns

selfobject

The updated object.

lm_polygraph.utils.common module

lm_polygraph.utils.common.load_external_module(path_to_file: str)[source]

Load external module from file and return it.

lm_polygraph.utils.common.polygraph_module_init(func)[source]
lm_polygraph.utils.common.seq_man_key(metric_name: str) Tuple[str, str][source]

Convert metric name to format of seq-level name format of saved manager archive.

lm_polygraph.utils.dataset module

class lm_polygraph.utils.dataset.Dataset(x: List[str], y: List[str], batch_size: int)[source]

Bases: object

Seq2seq dataset for calculating quality of uncertainty estimation method.

static from_csv(csv_path: str, x_column: str, y_column: str, batch_size: int, prompt: str = '', **kwargs)[source]

Creates the dataset from .CSV table.

Parameters:

csv_path (str): path to .csv table, x_column (str): name of column to take input texts from, y_column (str): name of column to take target texts from, batch_size (int): the size of the texts batch.

static from_datasets(dataset_path: str | List[str], x_column: str, y_column: str, batch_size: int, prompt: str = '', description: str = '', mmlu_max_subject_size: int = 100, n_shot: int = 0, few_shot_split: str = 'train', few_shot_prompt: str | None = None, instruct: bool = False, split: str = 'test', size: int | None = None, **kwargs)[source]

Creates the dataset from Huggingface datasets.

Parameters:

dataset_path (str): HF path to dataset, x_column (str): name of column to take input texts from, y_column (str): name of column to take target texts from, batch_size (int): the size of the texts batch, prompt (str): prompt template to use for input texts (default: ‘’), split (str): dataset split to take data from (default: ‘text’), size (Optional[int]): size to subsample dataset to. If None, the full dataset split will be taken.

Default: None.

static load(path_or_path_and_files: str | List[str], *args, **kwargs)[source]

Creates the dataset from either local .csv path (if such exists) or Huggingface datasets. See from_csv and from_datasets static functions for the description of *args and **kwargs arguments.

Parameters:

path_or_path_and_files (str or List[str]): local path to .csv table or HF path to dataset.

static load_hf_dataset(path: str | List[str], split: str, **kwargs)[source]
select(indices: List[int])[source]

Shrinks the dataset down to only texts with the specified index.

Parameters:

indices (List[int]): indices to left in the dataset.Must have the same length as input texts.

subsample(size: int, seed: int)[source]

Subsamples the dataset to the provided size.

Parameters:

size (int): size of the resulting dataset, seed (int): seed to perform random subsampling with.

train_test_split(test_size: int, seed: int, split: str = 'train')[source]

Samples dataset into train and test parts.

Parameters:

test_size (int): size of test dataset, seed (int): seed to perform random splitting with, split (str): either ‘train’ or ‘test’. If ‘train’, lefts only train data in the current dataset object.

If ‘test’, left only test data. Default: ‘train’.

Returns:
Tuple[List[str], List[str], List[str], List[str]]: train input and target texts list,

test input and target texts list.

lm_polygraph.utils.deberta module

class lm_polygraph.utils.deberta.Deberta(deberta_path: str = 'microsoft/deberta-large-mnli', batch_size: int = 10, device=None)[source]

Bases: object

Allows for the implementation of a singleton DeBERTa model which can be shared across different uncertainty estimation methods in the code.

property deberta
property deberta_tokenizer
setup()[source]

Loads and prepares the DeBERTa model from the specified path.

to(device)[source]
class lm_polygraph.utils.deberta.MultilingualDeberta(deberta_path: str = 'MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7', batch_size: int = 10, device=None)[source]

Bases: Deberta

Allows for the implementation of a singleton multilingual DeBERTa model which can be shared across different uncertainty estimation methods in the code.

setup()[source]

Loads and prepares the DeBERTa model from the specified path.

lm_polygraph.utils.generation_parameters module

class lm_polygraph.utils.generation_parameters.GenerationParameters(temperature: float = 1.0, top_k: int = 50, top_p: float = 1.0, do_sample: bool = False, num_beams: int = 1, presence_penalty: float = 0.0, repetition_penalty: float = 1.0, generate_until: tuple = (), allow_newlines: bool = True)[source]

Bases: object

Parameters to override in model generation.

Parameters:
temperature (float): Temperature in sampling generation. Has no effect when do_sample is not set.

Default: 1.0.

topk (int): Top-k token predictions to consider in sampling generation. Has no effect when do_sample is

not set. Default: 1.

topp (float): Only consider the highest unique tokens, which probabilities sum up to topp. Has no effect

when do_sample is not set. Default: 1.0.

do_sample (bool): If true, perform sampling from models probabilities. If false, only generate token with

maximum probability. Default: False.

num_beams (int): Number of beams if beam search generation is used. Has no effect when do_sample is not

set. Default: 1.

presence_penalty (float): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether

they appear in the text so far, increasing the model’s likelihood to talk about new topics. Applied for OpenAI-API blackbox models. Default: 0.0.

repetition_penalty (float): The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no

penalty. Applied for whitebox models from HuggingFace. Default: 1.0.

allow_newlines (bool): If set, the model is not allowed to generate tokens with newlines. Default: False.

allow_newlines: bool = True
do_sample: bool = False
generate_until: tuple = ()
num_beams: int = 1
presence_penalty: float = 0.0
repetition_penalty: float = 1.0
temperature: float = 1.0
top_k: int = 50
top_p: float = 1.0

lm_polygraph.utils.manager module

class lm_polygraph.utils.manager.UEManager(data: Dataset, model: Model, estimators: List[Estimator], generation_metrics: List[GenerationMetric], ue_metrics: List[UEMetric], processors: List[Processor], train_data: Dataset | None = None, background_train_data: Dataset | None = None, ignore_exceptions: bool = True, ensemble_model: WhiteboxModel | None = None, deberta_batch_size: int = 10, deberta_device: str | None = None, language: str = 'en', verbose: bool = True, max_new_tokens: int = 100, background_train_dataset_max_new_tokens: int = 100, cache_path='/home/docs/.cache')[source]

Bases: object

Manager to conduct uncertainty estimation experiments by using several uncertainty methods, ground-truth uncertainty values and correlation metrics at once. Used for running benchmarks.

Examples:

`python >>> from lm_polygraph import WhiteboxModel >>> from lm_polygraph.utils.dataset import Dataset >>> from lm_polygraph.estimators import * >>> from lm_polygraph.ue_metrics import * >>> from lm_polygraph.generation_metrics import * >>> model = WhiteboxModel.from_pretrained( ...     'bigscience/bloomz-560m', ...     device='cuda:0', ... ) >>> dataset = Dataset.load( ...     '../workdir/data/triviaqa.csv', ...     'question', 'answer', ...     batch_size=4, ... ) >>> ue_methods = [MaximumSequenceProbability(), SemanticEntropy()] >>> ue_metrics = [RiskCoverageCurveAUC()] >>> ground_truth = [RougeMetric('rougeL'), BartScoreSeqMetric('rh')] >>> man = UEManager(dataset, model, ue_methods, ground_truth, ue_metrics, processors=[]) >>> results = man() >>> results.save("./manager.man") `

calculate(batch_stats: dict, calculators: list, inp_texts: list) dict[source]

Runs stat calculators and handles errors if any occur. Returns updated batch stats

Parameters:

batch_stats (dict): contains current batch statistics to be updated calculators (list): list of stat calculators to run inp_texts (list): list of inputs to the model in the batch

estimate(batch_stats: dict, estimators: list) Dict[Tuple[str, str], List[float]][source]

Runs stat calculators and handles errors if any occur. Returns updated batch stats

Parameters:

batch_stats (dict): contains current batch statistics to be updated estimators (list): list of estimators to run

static load(load_path: str) UEManager[source]

Loads UEManager from the specified path. To save the calculated manager results, see UEManager.save().

Parameters:

load_path (str): Path to file with saved benchmark results to load.

save(save_path: str)[source]

Saves the run results in the provided path. Will raise exception, if no results are calculated yet. To load the saved manager, see UEManager.load().

Parameters:

save_path (str): Path to file to save benchmark results to.

class lm_polygraph.utils.manager.UncertaintyOutput(uncertainty: float | List[float], input_text: str, generation_text: str, generation_tokens: List[int], model_path: str, estimator: str)[source]

Bases: object

Uncertainty estimator output.

Parameters:

uncertainty (float): uncertainty estimation. input_text (str): text used as model input. generation_text (str): text generated by the model. model_path (str): path to the model used in generation.

estimator: str
generation_text: str
generation_tokens: List[int]
input_text: str
model_path: str
uncertainty: float | List[float]
lm_polygraph.utils.manager.estimate_uncertainty(model: Model, estimator: Estimator, input_text: str) UncertaintyOutput[source]

Estimated uncertainty of the model generation using the provided esitmator.

Parameters:
model (Model): model to estimate uncertainty of. Either lm_polygraph.WhiteboxModel or

lm_polygraph.BlackboxModel model can be used.

estimator (Estimator): uncertainty estimation method to use. Can be any of the methods at

lm_polygraph.estimators.

input_text (str): text to estimate uncertainty of.

Returns:

UncertaintyOutput: uncertainty estimation float along with supporting info.

Examples:

`python >>> from lm_polygraph import WhiteboxModel >>> from lm_polygraph.estimators import LexicalSimilarity >>> model = WhiteboxModel.from_pretrained( ...     'bigscience/bloomz-560m', ...     device='cpu', ... ) >>> estimator = LexicalSimilarity('rougeL') >>> estimate_uncertainty(model, estimator, input_text='Who is George Bush?') UncertaintyOutput(uncertainty=-0.9176470588235295, input_text='Who is George Bush?', generation_text=' President of the United States', model_path='bigscience/bloomz-560m') `

`python >>> from lm_polygraph import BlackboxModel >>> from lm_polygraph.estimators import EigValLaplacian >>> model = BlackboxModel.from_openai( ...     'YOUR_OPENAI_TOKEN', ...     'gpt-3.5-turbo' ... ) >>> estimator = EigValLaplacian() >>> estimate_uncertainty(model, estimator, input_text='When did Albert Einstein die?') UncertaintyOutput(uncertainty=1.0022274826855433, input_text='When did Albert Einstein die?', generation_text='Albert Einstein died on April 18, 1955.', model_path='gpt-3.5-turbo') `

lm_polygraph.utils.model module

class lm_polygraph.utils.model.BlackboxModel(openai_api_key: str | None = None, model_path: str | None = None, hf_api_token: str | None = None, parameters: GenerationParameters = GenerationParameters(temperature=1.0, top_k=50, top_p=1.0, do_sample=False, num_beams=1, presence_penalty=0.0, repetition_penalty=1.0, generate_until=(), allow_newlines=True))[source]

Bases: Model

Black-box model class. Have no access to model scores and logits. Currently implemented blackbox models: OpenAI models, Huggingface models.

Examples:

`python >>> from lm_polygraph import BlackboxModel >>> model = BlackboxModel.from_openai( ...     'YOUR_OPENAI_TOKEN', ...     'gpt-3.5-turbo' ... ) `

`python >>> from lm_polygraph import BlackboxModel >>> model = BlackboxModel.from_huggingface( ...     hf_api_token='YOUR_API_TOKEN', ...     hf_model_id='google/t5-large-ssm-nqo' ... ) `

static from_huggingface(hf_api_token: str, hf_model_id: str, **kwargs)[source]

Initializes a blackbox model from huggingface.

Parameters:

hf_api_token (Optional[str]): Huggingface API token if the blackbox model comes from HF. Default: None. hf_model_id (Optional[str]): model path in huggingface.

static from_openai(openai_api_key: str, model_path: str, **kwargs)[source]

Initializes a blackbox model from OpenAI API.

Parameters:

openai_api_key (Optional[str]): OpenAI API key. Default: None. model_path (Optional[str]): model name in OpenAI.

generate(**args)[source]

Not implemented for blackbox models.

generate_texts(input_texts: List[str], **args) List[str][source]

Generates a list of model answers using input texts batch.

Parameters:

input_texts (List[str]): input texts batch.

Return:

List[str]: corresponding model generations. Have the same length as input_texts.

tokenizer(*args, **kwargs)[source]

Not implemented for blackbox models.

class lm_polygraph.utils.model.Model(model_path: str, model_type: str)[source]

Bases: ABC

Abstract model class. Used as base class for both White-box models and Black-box models.

abstract generate(**args)[source]

Abstract method. Generates the model output with scores from batch formed by HF Tokenizer. Not implemented for black-box models.

abstract generate_texts(input_texts: List[str], **args) List[str][source]

Abstract method. Generates a list of model answers using input texts batch.

Parameters:

input_texts (List[str]): input texts batch.

Return:

List[str]: corresponding model generations. Have the same length as input_texts.

class lm_polygraph.utils.model.WhiteboxModel(model: AutoModelForCausalLM, tokenizer: AutoTokenizer, model_path: str | None = None, model_type: str = 'CausalLM', generation_parameters: GenerationParameters = GenerationParameters(temperature=1.0, top_k=50, top_p=1.0, do_sample=False, num_beams=1, presence_penalty=0.0, repetition_penalty=1.0, generate_until=(), allow_newlines=True))[source]

Bases: Model

White-box model class. Have access to model scores and logits. Currently implemented only for Huggingface models.

Examples:

`python >>> from lm_polygraph import WhiteboxModel >>> model = WhiteboxModel.from_pretrained( ...     "bigscience/bloomz-3b", ... ) `

device()[source]

Returns the device the model is currently loaded on.

Returns:

str: device string.

static from_pretrained(model_path: str, generation_params: Dict | None = {}, add_bos_token: bool = True, **kwargs)[source]

Initializes the model from HuggingFace. Automatically determines model type.

Parameters:

model_path (str): model path in HuggingFace. generation_params (Dict): generation arguments for

lm_polygraph.utils.generation_parametersGenerationParameters

add_bos_token (bool): tokenizer argument. Default: True.

generate(**args)[source]

Generates the model output with scores from batch formed by HF Tokenizer.

Parameters:

**args: Any arguments that can be passed to model.generate function from HuggingFace.

Returns:

ModelOutput: HuggingFace generation output with scores overriden with original probabilities.

generate_texts(input_texts: List[str], **args) List[str][source]

Generates a list of model answers using input texts batch.

Parameters:

input_texts (List[str]): input texts batch.

Return:

List[str]: corresponding model generations. Have the same length as input_texts.

get_stopping_criteria(input_ids: Tensor)[source]
tokenize(texts: List[str] | List[List[Dict[str, str]]]) Dict[str, Tensor][source]

Tokenizes input texts batch into a dictionary using the model tokenizer.

Parameters:

texts (List[str]): list of input texts batch.

Returns:

dict[str, torch.Tensor]: tensors dictionary obtained by tokenizing input texts batch.

lm_polygraph.utils.model.create_ensemble(models: List[WhiteboxModel] = [], mc: bool = False, seed: int = 1, mc_seeds: List[int] = [1], ensembling_mode: str = 'pe', dropout_rate: float = 0.1, **kwargs) WhiteboxModel[source]

lm_polygraph.utils.normalize module

lm_polygraph.utils.normalize.filter_nans(gen_metrics: ndarray, ues: ndarray) Tuple[ndarray, ndarray][source]

Filters out NaNs from gen_metrics and ues if they occur at least in one of the arrays.

Args: gen_metrics: Array of gen_metrics ues: Array of ues

Returns: Tuple of two arrays: - First array contains gen_metrics with NaNs removed - Second array contains ues with NaNs removed

lm_polygraph.utils.normalize.get_mans_ues_metrics(man_paths: List[str], ue_method_names: List[str], gen_metric_names: List[str]) Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]

Extracts and concats data from a list of paths to saved manager data files.

Args: man_paths: List of paths to manager data files ue_method_names: List of UE methods to extract gen_metric_names: List of gen_metrics to extract

Returns: Tuple of two dictionaries: - First dictionary contains UE method data, where keys are method names

and values are concatenated arrays of UE method data from all managers

  • Second dictionary contains gen_metric data, where keys are metric names and values are concatenated arrays of gen_metric data from all managers

lm_polygraph.utils.ood_detection module

lm_polygraph.utils.ood_detection.calculate_ood_from_mans(manager_id, manager_ood, ood_metrics)[source]

lm_polygraph.utils.openai_chat module

class lm_polygraph.utils.openai_chat.OpenAIChat(openai_model: str = 'gpt-4o', cache_path: str = '/home/docs/.cache')[source]

Bases: object

Allows for the implementation of a singleton class to chat with OpenAI model for dataset marking.

ask(message: str) str[source]

lm_polygraph.utils.processor module

class lm_polygraph.utils.processor.Logger[source]

Bases: Processor

Processor logging batch information to stdout.

on_batch(batch_stats: Dict[str, ndarray], batch_gen_metrics: Dict[Tuple[str, str], List[float]], batch_estimations: Dict[Tuple[str, str], List[float]])[source]

Outputs statistics from batch_stats, batch_gen_metrics and batch_estimations to stdout.

on_eval(metrics: Dict[Tuple[str, str, str, str], float], bad_estimators: Dict[Estimator, int])[source]

Outputs statistics from metrics and failed estimators to stdout.

class lm_polygraph.utils.processor.Processor[source]

Bases: object

Abstract class to perform actions after processing new texts batch.

on_batch(batch_stats: Dict[str, ndarray], batch_gen_metrics: Dict[Tuple[str, str], List[float]], batch_estimations: Dict[Tuple[str, str], List[float]])[source]

Processes new batch.

Parameters:

batch_stats (Dict[str, np.ndarray]): Dictionary of statistics calculated with stat_calculators. batch_gen_metrics (Dict[Tuple[str, str], List[float]]): Dictionary of generation metrics calculated

for the batch. Dictionary keys consist of UE level (sequence or token) and generation metrics name.

batch_estimations (Dict[Tuple[str, str], List[float]]): Dictionary of UE estimations calculated

for the batch. Dictionary keys consist of UE level (sequence or token) and UE estimator name.

on_eval(metrics: Dict[Tuple[str, str, str, str], float])[source]

Processes newly calculated evaluation metrics.

Parameters:
metrics (Dict[Tuple[str, str, str, str], float]: metrics calculated using ue_metrics on the batch which

was considered at the last on_batch call. Dictionary keys consist of UE level, estimator name, generation metrics name and ue_metrics name which was used to calculate quality metrics between this estimator’s uncertainty estimations and generation metric outputs.

lm_polygraph.utils.register_stat_calculators module

lm_polygraph.utils.register_stat_calculators.register_stat_calculators(deberta_batch_size: int = 10, deberta_device: str | None = None, language: str = 'en', n_ccp_alternatives: int = 10, cache_path='/home/docs/.cache', model: Model | None = None) Tuple[Dict[str, StatCalculator], Dict[str, List[str]]][source]

Registers all available statistic calculators to be seen by UEManager for properly organizing the calculations order.

lm_polygraph.utils.token_restoration module

lm_polygraph.utils.token_restoration.collect_sample_token_level_uncertainties(model_output, batch_size, num_return_sequences, vocab_size, pad_token_id, length_penalty=1.0, ensemble_uncertainties={})[source]
lm_polygraph.utils.token_restoration.collect_token_level_uncertainties(model_output, batch_size, beam_size, vocab_size, pad_token_id, length_penalty=1.0, ensemble_uncertainties={})[source]
lm_polygraph.utils.token_restoration.get_collect_fn(model_output)[source]
lm_polygraph.utils.token_restoration.update_token_level_scores(scores, batch_scores)[source]

Module contents