lm_polygraph.generation_metrics package

Submodules

lm_polygraph.generation_metrics.accuracy module

class lm_polygraph.generation_metrics.accuracy.AccuracyMetric(target_ignore_regex=None, output_ignore_regex=None, normalize=False)[source]

Bases: GenerationMetric

Calculates accuracy between model-generated texts and ground-truth. Two texts are considered equal if theis string representation is equal.

lm_polygraph.generation_metrics.aggregated_metric module

class lm_polygraph.generation_metrics.aggregated_metric.AggregatedMetric(base_metric: GenerationMetric, aggregation: str = 'max')[source]

Bases: GenerationMetric

Aggregated metric class, which wraps a base metric and aggregates its results for multi-target datasets.

lm_polygraph.generation_metrics.alignscore module

class lm_polygraph.generation_metrics.alignscore.AlignScore(lang='en', ckpt_path='https://huggingface.co/yzha/AlignScore/resolve/main/AlignScore-large.ckpt', batch_size=16, target_is_claims=True)[source]

Bases: GenerationMetric

Calculates AlignScore metric (https://aclanthology.org/2023.acl-long.634/) between model-generated texts and ground truth texts.

lm_polygraph.generation_metrics.alignscore_utils module

class lm_polygraph.generation_metrics.alignscore_utils.AlignScorer(model: str, batch_size: int, device: int, ckpt_path: str, evaluation_mode='nli_sp', verbose=True)[source]

Bases: object

score(contexts: List[str], claims: List[str]) List[float][source]
class lm_polygraph.generation_metrics.alignscore_utils.BERTAlignModel(model='roberta-large', using_pretrained=True, *args, **kwargs)[source]

Bases: Module

configure_optimizers()[source]

Prepare optimizer and schedule (linear warmup and decay)

electra_forward(batch)[source]
forward(batch)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mse_loss(input, target, ignored_index=-100.0, reduction='mean')[source]
training_step(train_batch, batch_idx)[source]
training_step_end(step_output)[source]
validation_epoch_end(outputs)[source]
validation_step(val_batch, batch_idx)[source]
validation_step_end(step_output)[source]
class lm_polygraph.generation_metrics.alignscore_utils.ElectraDiscriminatorPredictions(config)[source]

Bases: Module

Prediction module for the discriminator, made up of two dense layers.

forward(discriminator_hidden_states)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class lm_polygraph.generation_metrics.alignscore_utils.Inferencer(ckpt_path='https://huggingface.co/yzha/AlignScore/resolve/main/AlignScore-large.ckpt', model='bert-base-uncased', batch_size=32, device='cuda', verbose=True)[source]

Bases: object

batch_tokenize(premise, hypo)[source]

input premise and hypos are lists

chunks(lst, n)[source]

Yield successive n-sized chunks from lst.

inference(premise, hypo)[source]

inference a list of premise and hypo

Standard aggregation

inference_example_batch(premise: list, hypo: list)[source]

inference a example, premise: list hypo: list using self.inference to batch the process

SummaC Style aggregation

inference_per_example(premise: str, hypo: str)[source]

inference a example, premise: string hypo: string using self.inference to batch the process

inference_reg(premise, hypo)[source]

inference a list of premise and hypo

Standard aggregation

nlg_eval(premise, hypo)[source]
smart_doc(premise: list, hypo: list)[source]

inference a example, premise: list hypo: list using self.inference to batch the process

SMART Style aggregation

smart_l(premise, hypo)[source]
smart_n(premise, hypo)[source]
class lm_polygraph.generation_metrics.alignscore_utils.ModelOutput(loss: torch.FloatTensor | None = None, all_loss: list | None = None, loss_nums: list | None = None, prediction_logits: torch.FloatTensor = None, seq_relationship_logits: torch.FloatTensor = None, tri_label_logits: torch.FloatTensor = None, reg_label_logits: torch.FloatTensor = None, hidden_states: Tuple[torch.FloatTensor] | None = None, attentions: Tuple[torch.FloatTensor] | None = None)[source]

Bases: object

all_loss: list | None = None
attentions: Tuple[FloatTensor] | None = None
hidden_states: Tuple[FloatTensor] | None = None
loss: FloatTensor | None = None
loss_nums: list | None = None
prediction_logits: FloatTensor = None
reg_label_logits: FloatTensor = None
seq_relationship_logits: FloatTensor = None
tri_label_logits: FloatTensor = None

lm_polygraph.generation_metrics.bart_score module

class lm_polygraph.generation_metrics.bart_score.BartScoreSeqMetric(score_type: str = 'rh', device=None, max_length=256, checkpoint='facebook/bart-large-cnn')[source]

Bases: GenerationMetric

Calculates BARTScore metric (https://arxiv.org/abs/2106.11520) between model-generated texts and ground truth texts.

load(path=None)[source]

Load model from paraphrase finetuning

score(srcs, tgts, batch_size=4)[source]

Score a batch of examples

test(batch_size=3)[source]

Test

lm_polygraph.generation_metrics.bert_score module

class lm_polygraph.generation_metrics.bert_score.BertScoreMetric(lang='en')[source]

Bases: GenerationMetric

Calculates BERTScore metric (https://arxiv.org/abs/1904.09675) between model-generated texts and ground truth texts.

lm_polygraph.generation_metrics.bleu module

class lm_polygraph.generation_metrics.bleu.BLEUMetric[source]

Bases: GenerationMetric

Calculates BLEU metric between model-generated texts and ground truth texts.

lm_polygraph.generation_metrics.comet module

class lm_polygraph.generation_metrics.comet.Comet(source_ignore_regex=None, lang='en')[source]

Bases: GenerationMetric

Calculates COMET metric (https://aclanthology.org/2020.emnlp-main.213/) between model-generated texts and ground truth texts.

lm_polygraph.generation_metrics.generation_metric module

class lm_polygraph.generation_metrics.generation_metric.GenerationMetric(**kwargs)[source]

Bases: ABC

Abstract generation metric class, which measures ground-truth uncertainty by comparing model-generated text with dataset ground-truth text. This ground-truth uncertainty is further compared with different estimators’ uncertainties in UEManager using ue_metrics.

lm_polygraph.generation_metrics.model_score module

class lm_polygraph.generation_metrics.model_score.ModelScoreSeqMetric[source]

Bases: GenerationMetric

Calculates sequence-level ModelScore metric between model-generated texts and ground truth texts. For each ground-truth text r and model-generated text ‘h’, method measures sum log-probabilitiy of generation ‘h’ on prompt ‘Paraphrase “{r}”’ normalized by the h length.

class lm_polygraph.generation_metrics.model_score.ModelScoreTokenwiseMetric[source]

Bases: GenerationMetric

Calculates token-level ModelScore metric between model-generated texts and ground truth texts. For each ground-truth text r and model-generated text ‘h’, method measures log-probabilities of generation ‘h’ on prompt ‘Paraphrase “{r}”’.

lm_polygraph.generation_metrics.openai_fact_check module

class lm_polygraph.generation_metrics.openai_fact_check.OpenAIFactCheck(openai_model: str = 'gpt-4o', cache_path: str = '/home/docs/.cache', language: str = 'en')[source]

Bases: GenerationMetric

Calculates for each claim, whether it is true of not, using OpenAI model specified in lm_polygraph.stat_calculators.openai_chat.OpenAIChat.

lm_polygraph.generation_metrics.rouge module

class lm_polygraph.generation_metrics.rouge.RougeMetric(rouge_name)[source]

Bases: GenerationMetric

Calculates Rouge metric between model-generated texts and ground truth texts.

lm_polygraph.generation_metrics.sbert module

class lm_polygraph.generation_metrics.sbert.SbertMetric[source]

Bases: GenerationMetric

Module contents