lm_polygraph.generation_metrics package
Submodules
lm_polygraph.generation_metrics.accuracy module
- class lm_polygraph.generation_metrics.accuracy.AccuracyMetric(target_ignore_regex=None, output_ignore_regex=None, normalize=False)[source]
Bases:
GenerationMetricCalculates accuracy between model-generated texts and ground-truth. Two texts are considered equal if theis string representation is equal.
lm_polygraph.generation_metrics.aggregated_metric module
- class lm_polygraph.generation_metrics.aggregated_metric.AggregatedMetric(base_metric: GenerationMetric, aggregation: str = 'max')[source]
Bases:
GenerationMetricAggregated metric class, which wraps a base metric and aggregates its results for multi-target datasets.
lm_polygraph.generation_metrics.alignscore module
- class lm_polygraph.generation_metrics.alignscore.AlignScore(lang='en', ckpt_path='https://huggingface.co/yzha/AlignScore/resolve/main/AlignScore-large.ckpt', batch_size=16, target_is_claims=True)[source]
Bases:
GenerationMetricCalculates AlignScore metric (https://aclanthology.org/2023.acl-long.634/) between model-generated texts and ground truth texts.
lm_polygraph.generation_metrics.alignscore_utils module
- class lm_polygraph.generation_metrics.alignscore_utils.AlignScorer(model: str, batch_size: int, device: int, ckpt_path: str, evaluation_mode='nli_sp', verbose=True)[source]
Bases:
object
- class lm_polygraph.generation_metrics.alignscore_utils.BERTAlignModel(model='roberta-large', using_pretrained=True, *args, **kwargs)[source]
Bases:
Module- forward(batch)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class lm_polygraph.generation_metrics.alignscore_utils.ElectraDiscriminatorPredictions(config)[source]
Bases:
ModulePrediction module for the discriminator, made up of two dense layers.
- forward(discriminator_hidden_states)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class lm_polygraph.generation_metrics.alignscore_utils.Inferencer(ckpt_path='https://huggingface.co/yzha/AlignScore/resolve/main/AlignScore-large.ckpt', model='bert-base-uncased', batch_size=32, device='cuda', verbose=True)[source]
Bases:
object- inference_example_batch(premise: list, hypo: list)[source]
inference a example, premise: list hypo: list using self.inference to batch the process
SummaC Style aggregation
- inference_per_example(premise: str, hypo: str)[source]
inference a example, premise: string hypo: string using self.inference to batch the process
- class lm_polygraph.generation_metrics.alignscore_utils.ModelOutput(loss: torch.FloatTensor | None = None, all_loss: list | None = None, loss_nums: list | None = None, prediction_logits: torch.FloatTensor = None, seq_relationship_logits: torch.FloatTensor = None, tri_label_logits: torch.FloatTensor = None, reg_label_logits: torch.FloatTensor = None, hidden_states: Tuple[torch.FloatTensor] | None = None, attentions: Tuple[torch.FloatTensor] | None = None)[source]
Bases:
object- all_loss: list | None = None
- attentions: Tuple[FloatTensor] | None = None
- loss: FloatTensor | None = None
- loss_nums: list | None = None
- prediction_logits: FloatTensor = None
- reg_label_logits: FloatTensor = None
- seq_relationship_logits: FloatTensor = None
- tri_label_logits: FloatTensor = None
lm_polygraph.generation_metrics.bart_score module
- class lm_polygraph.generation_metrics.bart_score.BartScoreSeqMetric(score_type: str = 'rh', device=None, max_length=256, checkpoint='facebook/bart-large-cnn')[source]
Bases:
GenerationMetricCalculates BARTScore metric (https://arxiv.org/abs/2106.11520) between model-generated texts and ground truth texts.
lm_polygraph.generation_metrics.bert_score module
- class lm_polygraph.generation_metrics.bert_score.BertScoreMetric(lang='en')[source]
Bases:
GenerationMetricCalculates BERTScore metric (https://arxiv.org/abs/1904.09675) between model-generated texts and ground truth texts.
lm_polygraph.generation_metrics.bleu module
- class lm_polygraph.generation_metrics.bleu.BLEUMetric[source]
Bases:
GenerationMetricCalculates BLEU metric between model-generated texts and ground truth texts.
lm_polygraph.generation_metrics.comet module
- class lm_polygraph.generation_metrics.comet.Comet(source_ignore_regex=None, lang='en')[source]
Bases:
GenerationMetricCalculates COMET metric (https://aclanthology.org/2020.emnlp-main.213/) between model-generated texts and ground truth texts.
lm_polygraph.generation_metrics.generation_metric module
- class lm_polygraph.generation_metrics.generation_metric.GenerationMetric(**kwargs)[source]
Bases:
ABCAbstract generation metric class, which measures ground-truth uncertainty by comparing model-generated text with dataset ground-truth text. This ground-truth uncertainty is further compared with different estimators’ uncertainties in UEManager using ue_metrics.
lm_polygraph.generation_metrics.model_score module
- class lm_polygraph.generation_metrics.model_score.ModelScoreSeqMetric[source]
Bases:
GenerationMetricCalculates sequence-level ModelScore metric between model-generated texts and ground truth texts. For each ground-truth text r and model-generated text ‘h’, method measures sum log-probabilitiy of generation ‘h’ on prompt ‘Paraphrase “{r}”’ normalized by the h length.
- class lm_polygraph.generation_metrics.model_score.ModelScoreTokenwiseMetric[source]
Bases:
GenerationMetricCalculates token-level ModelScore metric between model-generated texts and ground truth texts. For each ground-truth text r and model-generated text ‘h’, method measures log-probabilities of generation ‘h’ on prompt ‘Paraphrase “{r}”’.
lm_polygraph.generation_metrics.openai_fact_check module
- class lm_polygraph.generation_metrics.openai_fact_check.OpenAIFactCheck(openai_model: str = 'gpt-4o', cache_path: str = '/home/docs/.cache', language: str = 'en')[source]
Bases:
GenerationMetricCalculates for each claim, whether it is true of not, using OpenAI model specified in lm_polygraph.stat_calculators.openai_chat.OpenAIChat.
lm_polygraph.generation_metrics.rouge module
- class lm_polygraph.generation_metrics.rouge.RougeMetric(rouge_name)[source]
Bases:
GenerationMetricCalculates Rouge metric between model-generated texts and ground truth texts.
lm_polygraph.generation_metrics.sbert module
- class lm_polygraph.generation_metrics.sbert.SbertMetric[source]
Bases:
GenerationMetric