lm_polygraph.utils.manager module

class lm_polygraph.utils.manager.UEManager(data: Dataset, model: Model, estimators: List[Estimator], builder_env_stat_calc: BuilderEnvironmentStatCalculator, available_stat_calculators: List[StatCalculatorContainer], generation_metrics: List[GenerationMetric], ue_metrics: List[UEMetric], processors: List[Processor], ignore_exceptions: bool = True, verbose: bool = True, max_new_tokens: int = 100, log_time: bool = False, save_stats: List[str] = [])[source]

Bases: object

Manager to conduct uncertainty estimation experiments by using several uncertainty methods, ground-truth uncertainty values and correlation metrics at once. Used for running benchmarks.

Examples:

`python >>> from lm_polygraph import WhiteboxModel >>> from lm_polygraph.utils.dataset import Dataset >>> from lm_polygraph.estimators import * >>> from lm_polygraph.ue_metrics import * >>> from lm_polygraph.generation_metrics import * >>> model = WhiteboxModel.from_pretrained( ...     'bigscience/bloomz-560m', ...     device='cuda:0', ... ) >>> dataset = Dataset.load( ...     '../workdir/data/triviaqa.csv', ...     'question', 'answer', ...     batch_size=4, ... ) >>> ue_methods = [MaximumSequenceProbability(), SemanticEntropy()] >>> ue_metrics = [RiskCoverageCurveAUC()] >>> ground_truth = [RougeMetric('rougeL'), BartScoreSeqMetric('rh')] >>> man = UEManager(dataset, model, ue_methods, ground_truth, ue_metrics, processors=[]) >>> results = man() >>> results.save("./manager.man") `

calculate(batch_stats: dict, calculators: list, inp_texts: list) dict[source]

Runs stat calculators and handles errors if any occur. Returns updated batch stats

Parameters:

batch_stats (dict): contains current batch statistics to be updated calculators (list): list of stat calculators to run inp_texts (list): list of inputs to the model in the batch

estimate(batch_stats: dict, estimators: list) Dict[Tuple[str, str], List[float]][source]

Runs stat calculators and handles errors if any occur. Returns updated batch stats

Parameters:

batch_stats (dict): contains current batch statistics to be updated estimators (list): list of estimators to run

eval_ue()[source]
init()[source]
static load(load_path: str, builder_env_stat_calc: BuilderEnvironmentStatCalculator = None, available_stat_calculators: List[StatCalculatorContainer] = None) UEManager[source]

Loads UEManager from the specified path. To save the calculated manager results, see UEManager.save().

Parameters:

load_path (str): Path to file with saved benchmark results to load.

save(save_path: str)[source]

Saves the run results in the provided path. To load the saved manager, see UEManager.load().

Parameters:

save_path (str): Path to file to save benchmark results to.

lm_polygraph.utils.manager.order_calculators(stats: List[str], stat_calculators: Dict[str, StatCalculator], stat_dependencies: Dict[str, List[str]]) Tuple[List[str], Set[str]][source]