lm_polygraph.utils package
Subpackages
- lm_polygraph.utils.ensemble_utils package
- Submodules
- lm_polygraph.utils.ensemble_utils.dropout module
- lm_polygraph.utils.ensemble_utils.ensemble_beam module
BeamSearchEncoderDecoderOutputBeamSearchEncoderDecoderOutput.beam_indicesBeamSearchEncoderDecoderOutput.cross_attentionsBeamSearchEncoderDecoderOutput.decoder_attentionsBeamSearchEncoderDecoderOutput.decoder_hidden_statesBeamSearchEncoderDecoderOutput.encoder_attentionsBeamSearchEncoderDecoderOutput.encoder_hidden_statesBeamSearchEncoderDecoderOutput.ep_uncertaintiesBeamSearchEncoderDecoderOutput.models_beam_next_token_logitsBeamSearchEncoderDecoderOutput.models_scoresBeamSearchEncoderDecoderOutput.pe_uncertaintiesBeamSearchEncoderDecoderOutput.scoresBeamSearchEncoderDecoderOutput.sequencesBeamSearchEncoderDecoderOutput.sequences_scores
EnsembleBeamSearchMixin
- lm_polygraph.utils.ensemble_utils.ensemble_generator module
EnsembleGenerationMixinEnsembleGenerationMixin.add_ensemble_models()EnsembleGenerationMixin.base_seedEnsembleGenerationMixin.calculate_entropy_based_measures()EnsembleGenerationMixin.ensembling_modeEnsembleGenerationMixin.mcEnsembleGenerationMixin.mc_models_numEnsembleGenerationMixin.mc_seedsEnsembleGenerationMixin.modelsEnsembleGenerationMixin.models_beam_logits_iterEnsembleGenerationMixin.tokenizer
- lm_polygraph.utils.ensemble_utils.ensemble_greedy module
EnsembleGreedyMixinGreedySearchEncoderDecoderOutputGreedySearchEncoderDecoderOutput.cross_attentionsGreedySearchEncoderDecoderOutput.decoder_attentionsGreedySearchEncoderDecoderOutput.decoder_hidden_statesGreedySearchEncoderDecoderOutput.encoder_attentionsGreedySearchEncoderDecoderOutput.encoder_hidden_statesGreedySearchEncoderDecoderOutput.ep_uncertaintiesGreedySearchEncoderDecoderOutput.models_hypo_next_token_logitsGreedySearchEncoderDecoderOutput.models_scoresGreedySearchEncoderDecoderOutput.pe_uncertaintiesGreedySearchEncoderDecoderOutput.scoresGreedySearchEncoderDecoderOutput.sequencesGreedySearchEncoderDecoderOutput.sequences_scores
- lm_polygraph.utils.ensemble_utils.ensemble_sample module
EnsembleSampleMixinSampleEncoderDecoderOutputSampleEncoderDecoderOutput.cross_attentionsSampleEncoderDecoderOutput.decoder_attentionsSampleEncoderDecoderOutput.decoder_hidden_statesSampleEncoderDecoderOutput.encoder_attentionsSampleEncoderDecoderOutput.encoder_hidden_statesSampleEncoderDecoderOutput.ep_uncertaintiesSampleEncoderDecoderOutput.models_beam_next_token_logitsSampleEncoderDecoderOutput.models_scoresSampleEncoderDecoderOutput.pe_uncertaintiesSampleEncoderDecoderOutput.scoresSampleEncoderDecoderOutput.sequences
- Module contents
- lm_polygraph.utils.prompt_templates package
Submodules
lm_polygraph.utils.cir_model module
This module contains the CenteredIsotonicRegression class. Copied with minor modifications from https://github.com/mathijs02/cir-model/blob/main/src/cir_model/cir_model.py
- class lm_polygraph.utils.cir_model.CenteredIsotonicRegression(non_centered_points: List[float | int] = [0, 1], **kwargs: Any)[source]
Bases:
IsotonicRegressionCentered Isotonic Regression (CIR) model. CIR is described in [1] and is similar to Isotonic Regression (IR). CIR takes as an additional constraint, compared to IR, that the resulting function needs to be strictly monotonic: ranges of constant function values are prevented as much as possible. The CenteredIsotonicRegression class inherits all methods and attributes from the scikit-learn implementation IsotonicRegression and it is therefore compatible with the other components of the scikit-learn library, like for example pipelines.
Parameters
This class takes the same parameters and has the same attributes as IsotonicRegression from scikit-learn.[2]_ For full documentation of IsotonicRegression, see: https://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html
CenteredIsotonicRegression takes one additional parameter:
- non_centered_pointslist, default: [0, 1]
A list of y values that should not be collapsed in the CIR algorithm. In the original CIR algorithm, y values of 0 and 1 are treated differently by not collapsing them. This is because CIR is typically used for a binary target variable. The default behaviour can be overruled by passing a list of values for non_centered_points. An empty list means that no points are treated differently.
References
Examples
>>> from cir_model import CenteredIsotonicRegression >>> x = [1, 2, 3, 4] >>> y = [1, 21, 41, 34] >>> model = CenteredIsotonicRegression().fit(x, y) >>> model.transform(x) array([ 1. , 21. , 32. , 37.5])
- fit(X: ndarray | List, y: ndarray | List, sample_weight: ndarray | List | None = None) CenteredIsotonicRegression[source]
Fit the model using X, y and optionally sample_weight as training data. This method takes the same parameters and returns the same objects as fit from IsotonicRegression. For full documentation of IsotonicRegression, see: https://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html#sklearn.isotonic.IsotonicRegression.fit
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
Returns
- selfobject
The updated object.
- set_predict_request(*, T: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression
Request metadata passed to the
predictmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters
- Tstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
Tparameter inpredict.
Returns
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns
- selfobject
The updated object.
- set_transform_request(*, T: bool | None | str = '$UNCHANGED$') CenteredIsotonicRegression
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters
- Tstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
Tparameter intransform.
Returns
- selfobject
The updated object.
lm_polygraph.utils.common module
lm_polygraph.utils.dataset module
- class lm_polygraph.utils.dataset.Dataset(x: List[str], y: List[str], batch_size: int)[source]
Bases:
objectSeq2seq dataset for calculating quality of uncertainty estimation method.
- static from_csv(csv_path: str, x_column: str, y_column: str, batch_size: int, prompt: str = '', **kwargs)[source]
Creates the dataset from .CSV table.
- Parameters:
csv_path (str): path to .csv table, x_column (str): name of column to take input texts from, y_column (str): name of column to take target texts from, batch_size (int): the size of the texts batch.
- static from_datasets(dataset_path: str | List[str], x_column: str, y_column: str, batch_size: int, prompt: str = '', description: str = '', mmlu_max_subject_size: int = 100, n_shot: int = 0, few_shot_split: str = 'train', few_shot_prompt: str | None = None, instruct: bool = False, split: str = 'test', size: int | None = None, **kwargs)[source]
Creates the dataset from Huggingface datasets.
- Parameters:
dataset_path (str): HF path to dataset, x_column (str): name of column to take input texts from, y_column (str): name of column to take target texts from, batch_size (int): the size of the texts batch, prompt (str): prompt template to use for input texts (default: ‘’), split (str): dataset split to take data from (default: ‘text’), size (Optional[int]): size to subsample dataset to. If None, the full dataset split will be taken.
Default: None.
- static load(path_or_path_and_files: str | List[str], *args, **kwargs)[source]
Creates the dataset from either local .csv path (if such exists) or Huggingface datasets. See from_csv and from_datasets static functions for the description of *args and **kwargs arguments.
- Parameters:
path_or_path_and_files (str or List[str]): local path to .csv table or HF path to dataset.
- select(indices: List[int])[source]
Shrinks the dataset down to only texts with the specified index.
- Parameters:
indices (List[int]): indices to left in the dataset.Must have the same length as input texts.
- subsample(size: int, seed: int)[source]
Subsamples the dataset to the provided size.
- Parameters:
size (int): size of the resulting dataset, seed (int): seed to perform random subsampling with.
- train_test_split(test_size: int, seed: int, split: str = 'train')[source]
Samples dataset into train and test parts.
- Parameters:
test_size (int): size of test dataset, seed (int): seed to perform random splitting with, split (str): either ‘train’ or ‘test’. If ‘train’, lefts only train data in the current dataset object.
If ‘test’, left only test data. Default: ‘train’.
- Returns:
- Tuple[List[str], List[str], List[str], List[str]]: train input and target texts list,
test input and target texts list.
lm_polygraph.utils.deberta module
- class lm_polygraph.utils.deberta.Deberta(deberta_path: str = 'microsoft/deberta-large-mnli', batch_size: int = 10, device=None)[source]
Bases:
objectAllows for the implementation of a singleton DeBERTa model which can be shared across different uncertainty estimation methods in the code.
- property deberta
- property deberta_tokenizer
- class lm_polygraph.utils.deberta.MultilingualDeberta(deberta_path: str = 'MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7', batch_size: int = 10, device=None)[source]
Bases:
DebertaAllows for the implementation of a singleton multilingual DeBERTa model which can be shared across different uncertainty estimation methods in the code.
lm_polygraph.utils.generation_parameters module
- class lm_polygraph.utils.generation_parameters.GenerationParameters(temperature: float = 1.0, top_k: int = 50, top_p: float = 1.0, do_sample: bool = False, num_beams: int = 1, presence_penalty: float = 0.0, repetition_penalty: float = 1.0, generate_until: tuple = (), allow_newlines: bool = True)[source]
Bases:
objectParameters to override in model generation.
- Parameters:
- temperature (float): Temperature in sampling generation. Has no effect when do_sample is not set.
Default: 1.0.
- topk (int): Top-k token predictions to consider in sampling generation. Has no effect when do_sample is
not set. Default: 1.
- topp (float): Only consider the highest unique tokens, which probabilities sum up to topp. Has no effect
when do_sample is not set. Default: 1.0.
- do_sample (bool): If true, perform sampling from models probabilities. If false, only generate token with
maximum probability. Default: False.
- num_beams (int): Number of beams if beam search generation is used. Has no effect when do_sample is not
set. Default: 1.
- presence_penalty (float): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether
they appear in the text so far, increasing the model’s likelihood to talk about new topics. Applied for OpenAI-API blackbox models. Default: 0.0.
- repetition_penalty (float): The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no
penalty. Applied for whitebox models from HuggingFace. Default: 1.0.
allow_newlines (bool): If set, the model is not allowed to generate tokens with newlines. Default: False.
- allow_newlines: bool = True
- do_sample: bool = False
- generate_until: tuple = ()
- num_beams: int = 1
- presence_penalty: float = 0.0
- repetition_penalty: float = 1.0
- temperature: float = 1.0
- top_k: int = 50
- top_p: float = 1.0
lm_polygraph.utils.manager module
- class lm_polygraph.utils.manager.UEManager(data: Dataset, model: Model, estimators: List[Estimator], generation_metrics: List[GenerationMetric], ue_metrics: List[UEMetric], processors: List[Processor], train_data: Dataset | None = None, background_train_data: Dataset | None = None, ignore_exceptions: bool = True, ensemble_model: WhiteboxModel | None = None, deberta_batch_size: int = 10, deberta_device: str | None = None, language: str = 'en', verbose: bool = True, max_new_tokens: int = 100, background_train_dataset_max_new_tokens: int = 100, cache_path='/home/docs/.cache')[source]
Bases:
objectManager to conduct uncertainty estimation experiments by using several uncertainty methods, ground-truth uncertainty values and correlation metrics at once. Used for running benchmarks.
Examples:
`python >>> from lm_polygraph import WhiteboxModel >>> from lm_polygraph.utils.dataset import Dataset >>> from lm_polygraph.estimators import * >>> from lm_polygraph.ue_metrics import * >>> from lm_polygraph.generation_metrics import * >>> model = WhiteboxModel.from_pretrained( ... 'bigscience/bloomz-560m', ... device='cuda:0', ... ) >>> dataset = Dataset.load( ... '../workdir/data/triviaqa.csv', ... 'question', 'answer', ... batch_size=4, ... ) >>> ue_methods = [MaximumSequenceProbability(), SemanticEntropy()] >>> ue_metrics = [RiskCoverageCurveAUC()] >>> ground_truth = [RougeMetric('rougeL'), BartScoreSeqMetric('rh')] >>> man = UEManager(dataset, model, ue_methods, ground_truth, ue_metrics, processors=[]) >>> results = man() >>> results.save("./manager.man") `- calculate(batch_stats: dict, calculators: list, inp_texts: list) dict[source]
Runs stat calculators and handles errors if any occur. Returns updated batch stats
- Parameters:
batch_stats (dict): contains current batch statistics to be updated calculators (list): list of stat calculators to run inp_texts (list): list of inputs to the model in the batch
- estimate(batch_stats: dict, estimators: list) Dict[Tuple[str, str], List[float]][source]
Runs stat calculators and handles errors if any occur. Returns updated batch stats
- Parameters:
batch_stats (dict): contains current batch statistics to be updated estimators (list): list of estimators to run
- class lm_polygraph.utils.manager.UncertaintyOutput(uncertainty: float | List[float], input_text: str, generation_text: str, generation_tokens: List[int], model_path: str, estimator: str)[source]
Bases:
objectUncertainty estimator output.
- Parameters:
uncertainty (float): uncertainty estimation. input_text (str): text used as model input. generation_text (str): text generated by the model. model_path (str): path to the model used in generation.
- estimator: str
- generation_text: str
- generation_tokens: List[int]
- input_text: str
- model_path: str
- uncertainty: float | List[float]
- lm_polygraph.utils.manager.estimate_uncertainty(model: Model, estimator: Estimator, input_text: str) UncertaintyOutput[source]
Estimated uncertainty of the model generation using the provided esitmator.
- Parameters:
- model (Model): model to estimate uncertainty of. Either lm_polygraph.WhiteboxModel or
lm_polygraph.BlackboxModel model can be used.
- estimator (Estimator): uncertainty estimation method to use. Can be any of the methods at
lm_polygraph.estimators.
input_text (str): text to estimate uncertainty of.
- Returns:
UncertaintyOutput: uncertainty estimation float along with supporting info.
Examples:
`python >>> from lm_polygraph import WhiteboxModel >>> from lm_polygraph.estimators import LexicalSimilarity >>> model = WhiteboxModel.from_pretrained( ... 'bigscience/bloomz-560m', ... device='cpu', ... ) >>> estimator = LexicalSimilarity('rougeL') >>> estimate_uncertainty(model, estimator, input_text='Who is George Bush?') UncertaintyOutput(uncertainty=-0.9176470588235295, input_text='Who is George Bush?', generation_text=' President of the United States', model_path='bigscience/bloomz-560m') ``python >>> from lm_polygraph import BlackboxModel >>> from lm_polygraph.estimators import EigValLaplacian >>> model = BlackboxModel.from_openai( ... 'YOUR_OPENAI_TOKEN', ... 'gpt-3.5-turbo' ... ) >>> estimator = EigValLaplacian() >>> estimate_uncertainty(model, estimator, input_text='When did Albert Einstein die?') UncertaintyOutput(uncertainty=1.0022274826855433, input_text='When did Albert Einstein die?', generation_text='Albert Einstein died on April 18, 1955.', model_path='gpt-3.5-turbo') `
lm_polygraph.utils.model module
- class lm_polygraph.utils.model.BlackboxModel(openai_api_key: str | None = None, model_path: str | None = None, hf_api_token: str | None = None, parameters: GenerationParameters = GenerationParameters(temperature=1.0, top_k=50, top_p=1.0, do_sample=False, num_beams=1, presence_penalty=0.0, repetition_penalty=1.0, generate_until=(), allow_newlines=True))[source]
Bases:
ModelBlack-box model class. Have no access to model scores and logits. Currently implemented blackbox models: OpenAI models, Huggingface models.
Examples:
`python >>> from lm_polygraph import BlackboxModel >>> model = BlackboxModel.from_openai( ... 'YOUR_OPENAI_TOKEN', ... 'gpt-3.5-turbo' ... ) ``python >>> from lm_polygraph import BlackboxModel >>> model = BlackboxModel.from_huggingface( ... hf_api_token='YOUR_API_TOKEN', ... hf_model_id='google/t5-large-ssm-nqo' ... ) `- static from_huggingface(hf_api_token: str, hf_model_id: str, **kwargs)[source]
Initializes a blackbox model from huggingface.
- Parameters:
hf_api_token (Optional[str]): Huggingface API token if the blackbox model comes from HF. Default: None. hf_model_id (Optional[str]): model path in huggingface.
- static from_openai(openai_api_key: str, model_path: str, **kwargs)[source]
Initializes a blackbox model from OpenAI API.
- Parameters:
openai_api_key (Optional[str]): OpenAI API key. Default: None. model_path (Optional[str]): model name in OpenAI.
- class lm_polygraph.utils.model.Model(model_path: str, model_type: str)[source]
Bases:
ABCAbstract model class. Used as base class for both White-box models and Black-box models.
- class lm_polygraph.utils.model.WhiteboxModel(model: AutoModelForCausalLM, tokenizer: AutoTokenizer, model_path: str | None = None, model_type: str = 'CausalLM', generation_parameters: GenerationParameters = GenerationParameters(temperature=1.0, top_k=50, top_p=1.0, do_sample=False, num_beams=1, presence_penalty=0.0, repetition_penalty=1.0, generate_until=(), allow_newlines=True))[source]
Bases:
ModelWhite-box model class. Have access to model scores and logits. Currently implemented only for Huggingface models.
Examples:
`python >>> from lm_polygraph import WhiteboxModel >>> model = WhiteboxModel.from_pretrained( ... "bigscience/bloomz-3b", ... ) `- static from_pretrained(model_path: str, generation_params: Dict | None = {}, add_bos_token: bool = True, **kwargs)[source]
Initializes the model from HuggingFace. Automatically determines model type.
- Parameters:
model_path (str): model path in HuggingFace. generation_params (Dict): generation arguments for
lm_polygraph.utils.generation_parametersGenerationParameters
add_bos_token (bool): tokenizer argument. Default: True.
- generate(**args)[source]
Generates the model output with scores from batch formed by HF Tokenizer.
- Parameters:
**args: Any arguments that can be passed to model.generate function from HuggingFace.
- Returns:
ModelOutput: HuggingFace generation output with scores overriden with original probabilities.
- generate_texts(input_texts: List[str], **args) List[str][source]
Generates a list of model answers using input texts batch.
- Parameters:
input_texts (List[str]): input texts batch.
- Return:
List[str]: corresponding model generations. Have the same length as input_texts.
- tokenize(texts: List[str] | List[List[Dict[str, str]]]) Dict[str, Tensor][source]
Tokenizes input texts batch into a dictionary using the model tokenizer.
- Parameters:
texts (List[str]): list of input texts batch.
- Returns:
dict[str, torch.Tensor]: tensors dictionary obtained by tokenizing input texts batch.
- lm_polygraph.utils.model.create_ensemble(models: List[WhiteboxModel] = [], mc: bool = False, seed: int = 1, mc_seeds: List[int] = [1], ensembling_mode: str = 'pe', dropout_rate: float = 0.1, **kwargs) WhiteboxModel[source]
lm_polygraph.utils.normalize module
- lm_polygraph.utils.normalize.filter_nans(gen_metrics: ndarray, ues: ndarray) Tuple[ndarray, ndarray][source]
Filters out NaNs from gen_metrics and ues if they occur at least in one of the arrays.
Args: gen_metrics: Array of gen_metrics ues: Array of ues
Returns: Tuple of two arrays: - First array contains gen_metrics with NaNs removed - Second array contains ues with NaNs removed
- lm_polygraph.utils.normalize.get_mans_ues_metrics(man_paths: List[str], ue_method_names: List[str], gen_metric_names: List[str]) Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]
Extracts and concats data from a list of paths to saved manager data files.
Args: man_paths: List of paths to manager data files ue_method_names: List of UE methods to extract gen_metric_names: List of gen_metrics to extract
Returns: Tuple of two dictionaries: - First dictionary contains UE method data, where keys are method names
and values are concatenated arrays of UE method data from all managers
Second dictionary contains gen_metric data, where keys are metric names and values are concatenated arrays of gen_metric data from all managers
lm_polygraph.utils.ood_detection module
lm_polygraph.utils.openai_chat module
lm_polygraph.utils.processor module
- class lm_polygraph.utils.processor.Logger[source]
Bases:
ProcessorProcessor logging batch information to stdout.
- class lm_polygraph.utils.processor.Processor[source]
Bases:
objectAbstract class to perform actions after processing new texts batch.
- on_batch(batch_stats: Dict[str, ndarray], batch_gen_metrics: Dict[Tuple[str, str], List[float]], batch_estimations: Dict[Tuple[str, str], List[float]])[source]
Processes new batch.
- Parameters:
batch_stats (Dict[str, np.ndarray]): Dictionary of statistics calculated with stat_calculators. batch_gen_metrics (Dict[Tuple[str, str], List[float]]): Dictionary of generation metrics calculated
for the batch. Dictionary keys consist of UE level (sequence or token) and generation metrics name.
- batch_estimations (Dict[Tuple[str, str], List[float]]): Dictionary of UE estimations calculated
for the batch. Dictionary keys consist of UE level (sequence or token) and UE estimator name.
- on_eval(metrics: Dict[Tuple[str, str, str, str], float])[source]
Processes newly calculated evaluation metrics.
- Parameters:
- metrics (Dict[Tuple[str, str, str, str], float]: metrics calculated using ue_metrics on the batch which
was considered at the last on_batch call. Dictionary keys consist of UE level, estimator name, generation metrics name and ue_metrics name which was used to calculate quality metrics between this estimator’s uncertainty estimations and generation metric outputs.
lm_polygraph.utils.register_stat_calculators module
- lm_polygraph.utils.register_stat_calculators.register_stat_calculators(deberta_batch_size: int = 10, deberta_device: str | None = None, language: str = 'en', n_ccp_alternatives: int = 10, cache_path='/home/docs/.cache', model: Model | None = None) Tuple[Dict[str, StatCalculator], Dict[str, List[str]]][source]
Registers all available statistic calculators to be seen by UEManager for properly organizing the calculations order.
lm_polygraph.utils.token_restoration module
- lm_polygraph.utils.token_restoration.collect_sample_token_level_uncertainties(model_output, batch_size, num_return_sequences, vocab_size, pad_token_id, length_penalty=1.0, ensemble_uncertainties={})[source]