lm_polygraph.utils.ensemble_utils package

Submodules

lm_polygraph.utils.ensemble_utils.dropout module

class lm_polygraph.utils.ensemble_utils.dropout.ConsistentDropout(p: float = 0.5, inplace: bool = False)[source]

Bases: Dropout

forward(input: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_share_across_tokens(input: Tensor) Tensor[source]
identity(input: Tensor) Tensor[source]
lm_polygraph.utils.ensemble_utils.dropout.functional_dropout(input: Tensor, p: float = 0.5, training: bool = True, inplace: bool = False) Tensor[source]
lm_polygraph.utils.ensemble_utils.dropout.functional_dropout_share(input: Tensor, p: float = 0.5, training: bool = True, inplace: bool = False) Tensor[source]
lm_polygraph.utils.ensemble_utils.dropout.replace_dropout(model_name, module, p=0.1, share_across_tokens=True)[source]
lm_polygraph.utils.ensemble_utils.dropout.replace_with_identity(module)[source]

lm_polygraph.utils.ensemble_utils.ensemble_beam module

class lm_polygraph.utils.ensemble_utils.ensemble_beam.BeamSearchEncoderDecoderOutput(sequences: LongTensor | None = None, sequences_scores: FloatTensor | None = None, scores: Tuple[FloatTensor] | None = None, models_scores: Tuple[List[FloatTensor]] | None = None, models_beam_next_token_logits: Tuple[FloatTensor] | None = None, pe_uncertainties: Dict[str, List[FloatTensor]] | None = None, ep_uncertainties: Dict[str, List[FloatTensor]] | None = None, beam_indices: LongTensor | None = None, encoder_attentions: Tuple[FloatTensor] | None = None, encoder_hidden_states: Tuple[FloatTensor] | None = None, decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None, cross_attentions: Tuple[Tuple[FloatTensor]] | None = None, decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None)[source]

Bases: ModelOutput

Base class for outputs of encoder-decoder generation models using beam search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes)

Args:
sequences (torch.LongTensor of shape (batch_size*num_return_sequences, sequence_length)):

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

sequences_scores (torch.FloatTensor of shape (batch_size*num_return_sequences), optional, returned when output_scores=True is passed or when config.output_scores=True):

Final beam scores of the generated sequences.

scores (tuple(torch.FloatTensor) optional, returned when output_scores=True is passed or when config.output_scores=True):

Beam transition scores for each vocabulary token at each generation step. Beam transition scores consisting of log probabilities of tokens conditioned on log softmax of previously generated tokens in this beam. (max_length-1,)-shaped tuple of torch.FloatTensor with each tensor of shape (batch_size*num_beams, config.vocab_size)).

beam_indices (tuple(tuple(torch.LongTensor)), optional, returned when output_scores=True is passed or when config.output_scores=True):

Beam indices of generated token id at each generation step. torch.LongTensor of shape (batch_size*num_return_sequences, max_length-1).

attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True): encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer of the decoder) of shape (batch_size, num_heads, sequence_length, sequence_length).

encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size*num_beams*num_return_sequences, sequence_length, hidden_size).

decoder_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size*num_beams*num_return_sequences, num_heads, generated_length, sequence_length).

cross_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).

decoder_hidden_states (tuple(tuple(torch.FloatTensor)), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size*num_beams*num_return_sequences, generated_length, hidden_size).

beam_indices: LongTensor | None = None
cross_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None
encoder_attentions: Tuple[FloatTensor] | None = None
encoder_hidden_states: Tuple[FloatTensor] | None = None
ep_uncertainties: Dict[str, List[FloatTensor]] | None = None
models_beam_next_token_logits: Tuple[FloatTensor] | None = None
models_scores: Tuple[List[FloatTensor]] | None = None
pe_uncertainties: Dict[str, List[FloatTensor]] | None = None
scores: Tuple[FloatTensor] | None = None
sequences: LongTensor = None
sequences_scores: FloatTensor | None = None
class lm_polygraph.utils.ensemble_utils.ensemble_beam.EnsembleBeamSearchMixin[source]

Bases: GenerationMixin

Averages the function across the ensemble models Generates sequences of token ids for models with a language modeling head using beam search decoding and can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.

Parameters:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

The sequence used as a prompt for the generation.

beam_scorer (BeamScorer):

An derived instance of [BeamScorer] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [BeamScorer] should be read.

logits_processor (LogitsProcessorList, optional):

An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.

stopping_criteria (StoppingCriteriaList, optional):

An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.

max_length (int, optional, defaults to 20):

DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.

pad_token_id (int, optional):

The id of the padding token.

eos_token_id (int, optional):

The id of the end-of-sequence token.

output_attentions (bool, optional, defaults to False):

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.

output_hidden_states (bool, optional, defaults to False):

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.

output_scores (bool, optional, defaults to False):

Whether or not to return the prediction scores. See scores under returned tensors for more details.

return_dict_in_generate (bool, optional, defaults to False):

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

synced_gpus (bool, optional, defaults to False):

Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

model_kwargs:

Additional model specific kwargs will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Return:

[generation_utilsBeamSearchDecoderOnlyOutput], [~generation_utils.BeamSearchEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation_utils.BeamSearchDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation_utils.BeamSearchEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

Examples:

```python >>> from transformers import ( … AutoTokenizer, … AutoModelForSeq2SeqLM, … LogitsProcessorList, … MinLengthLogitsProcessor, … BeamSearchScorer, … ) >>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> encoder_input_str = "translate English to German: How old are you?"
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
>>> # lets run beam search using 3 beams
>>> num_beams = 3
>>> # define decoder start token ids
>>> input_ids = torch.ones((num_beams, 1), device=model.device, dtype=torch.long)
>>> input_ids = input_ids * model.config.decoder_start_token_id
>>> # add encoder_outputs to model keyword arguments
>>> model_kwargs = {
...     "encoder_outputs": model.get_encoder()(
...         encoder_input_ids.repeat_interleave(num_beams, dim=0), return_dict=True
...     )
... }
>>> # instantiate beam scorer
>>> beam_scorer = BeamSearchScorer(
...     batch_size=1,
...     num_beams=num_beams,
...     device=model.device,
... )
>>> # instantiate logits processors
>>> logits_processor = LogitsProcessorList(
...     [
...         MinLengthLogitsProcessor(5, eos_token_id=model.config.eos_token_id),
...     ]
... )
>>> outputs = model.beam_search(input_ids, beam_scorer, logits_processor=logits_processor, **model_kwargs)
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Wie alt bist du?']
```

lm_polygraph.utils.ensemble_utils.ensemble_generator module

class lm_polygraph.utils.ensemble_utils.ensemble_generator.EnsembleGenerationMixin[source]

Bases: EnsembleBeamSearchMixin, EnsembleSampleMixin, EnsembleGreedyMixin, GenerationMixin

add_ensemble_models(models, devices)[source]
property base_seed
calculate_entropy_based_measures(enable=True)[source]
property ensembling_mode
property mc
property mc_models_num
property mc_seeds
property models
property models_beam_logits_iter
property tokenizer

lm_polygraph.utils.ensemble_utils.ensemble_greedy module

class lm_polygraph.utils.ensemble_utils.ensemble_greedy.EnsembleGreedyMixin[source]

Bases: GenerationMixin

Generates sequences of token ids for models with a language modeling head using greedy decoding and can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.

<Tip warning={true}>

In most cases, you do not need to call [~generation.GenerationMixin.greedy_search] directly. Use generate() instead. For an overview of generation strategies and code examples, check the [following guide](../generation_strategies).

</Tip>

Parameters:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

The sequence used as a prompt for the generation.

logits_processor (LogitsProcessorList, optional):

An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.

stopping_criteria (StoppingCriteriaList, optional):

An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.

max_length (int, optional, defaults to 20):

DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.

pad_token_id (int, optional):

The id of the padding token.

eos_token_id (Union[int, List[int]], optional):

The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.

output_attentions (bool, optional, defaults to False):

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.

output_hidden_states (bool, optional, defaults to False):

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.

output_scores (bool, optional, defaults to False):

Whether or not to return the prediction scores. See scores under returned tensors for more details.

return_dict_in_generate (bool, optional, defaults to False):

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

synced_gpus (bool, optional, defaults to False):

Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

streamer (BaseStreamer, optional):

Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

model_kwargs:

Additional model specific keyword arguments will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Return:

[~generation.GreedySearchDecoderOnlyOutput], [~generation.GreedySearchEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation.GreedySearchDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation.GreedySearchEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

Examples:

```python >>> from transformers import ( … AutoTokenizer, … AutoModelForCausalLM, … LogitsProcessorList, … MinLengthLogitsProcessor, … StoppingCriteriaList, … MaxLengthCriteria, … )

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a PAD token
>>> model.generation_config.pad_token_id = model.generation_config.eos_token_id
>>> input_prompt = "It might be possible to"
>>> input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids
>>> # instantiate logits processors
>>> logits_processor = LogitsProcessorList(
...     [
...         MinLengthLogitsProcessor(10, eos_token_id=model.generation_config.eos_token_id),
...     ]
... )
>>> stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
>>> outputs = model.greedy_search(
...     input_ids, logits_processor=logits_processor, stopping_criteria=stopping_criteria
... )
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
["It might be possible to get a better understanding of the nature of the problem, but it's not"]
```
class lm_polygraph.utils.ensemble_utils.ensemble_greedy.GreedySearchEncoderDecoderOutput(sequences: LongTensor | None = None, sequences_scores: FloatTensor | None = None, scores: Tuple[FloatTensor] | None = None, models_scores: Tuple[List[FloatTensor]] | None = None, models_hypo_next_token_logits: Tuple[FloatTensor] | None = None, pe_uncertainties: Dict[str, List[FloatTensor]] | None = None, ep_uncertainties: Dict[str, List[FloatTensor]] | None = None, encoder_attentions: Tuple[FloatTensor] | None = None, encoder_hidden_states: Tuple[FloatTensor] | None = None, decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None, cross_attentions: Tuple[Tuple[FloatTensor]] | None = None, decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None)[source]

Bases: ModelOutput

Base class for outputs of encoder-decoder generation models using greedy search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes)

Args:
sequences (torch.LongTensor of shape (batch_size, sequence_length)):

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

scores (tuple(torch.FloatTensor) optional, returned when output_scores=True is passed or when config.output_scores=True):

Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of torch.FloatTensor with up to max_new_tokens elements (one element for each generated token), with each tensor of shape (batch_size, config.vocab_size).

encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer of the decoder) of shape (batch_size, num_heads, sequence_length, sequence_length).

encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

decoder_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).

cross_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).

decoder_hidden_states (tuple(tuple(torch.FloatTensor)), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, generated_length, hidden_size).

cross_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None
encoder_attentions: Tuple[FloatTensor] | None = None
encoder_hidden_states: Tuple[FloatTensor] | None = None
ep_uncertainties: Dict[str, List[FloatTensor]] | None = None
models_hypo_next_token_logits: Tuple[FloatTensor] | None = None
models_scores: Tuple[List[FloatTensor]] | None = None
pe_uncertainties: Dict[str, List[FloatTensor]] | None = None
scores: Tuple[FloatTensor] | None = None
sequences: LongTensor = None
sequences_scores: FloatTensor | None = None

lm_polygraph.utils.ensemble_utils.ensemble_sample module

class lm_polygraph.utils.ensemble_utils.ensemble_sample.EnsembleSampleMixin[source]

Bases: GenerationMixin

sample(input_ids: LongTensor, logits_processor: LogitsProcessorList | None = None, stopping_criteria: StoppingCriteriaList | None = None, logits_warper: LogitsProcessorList | None = None, max_length: int | None = None, pad_token_id: int | None = None, eos_token_id: int | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, output_scores: bool | None = None, return_dict_in_generate: bool | None = None, synced_gpus: bool | None = False, streamer: BaseStreamer | None = None, **model_kwargs) GenerateEncoderDecoderOutput | GenerateDecoderOnlyOutput | LongTensor[source]

Generates sequences of token ids for models with a language modeling head using multinomial sampling and can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.

Parameters:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

The sequence used as a prompt for the generation.

logits_processor (LogitsProcessorList, optional):

An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.

stopping_criteria (StoppingCriteriaList, optional):

An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.

logits_warper (LogitsProcessorList, optional):

An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsWarper] used to warp the prediction score distribution of the language modeling head applied before multinomial sampling at each generation step.

max_length (int, optional, defaults to 20):

DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.

pad_token_id (int, optional):

The id of the padding token.

eos_token_id (int, optional):

The id of the end-of-sequence token.

output_attentions (bool, optional, defaults to False):

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.

output_hidden_states (bool, optional, defaults to False):

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.

output_scores (bool, optional, defaults to False):

Whether or not to return the prediction scores. See scores under returned tensors for more details.

return_dict_in_generate (bool, optional, defaults to False):

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

synced_gpus (bool, optional, defaults to False):

Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

model_kwargs:

Additional model specific kwargs will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Return:

[~generation_utils.SampleDecoderOnlyOutput], [~generation_utils.SampleEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation_utils.SampleDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation_utils.SampleEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

Examples:

```python >>> from transformers import ( … AutoTokenizer, … AutoModelForCausalLM, … LogitsProcessorList, … MinLengthLogitsProcessor, … TopKLogitsWarper, … TemperatureLogitsWarper, … StoppingCriteriaList, … MaxLengthCriteria, … ) >>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a EOS token
>>> model.config.pad_token_id = model.config.eos_token_id
>>> input_prompt = "Today is a beautiful day, and"
>>> input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids
>>> # instantiate logits processors
>>> logits_processor = LogitsProcessorList(
...     [
...         MinLengthLogitsProcessor(15, eos_token_id=model.config.eos_token_id),
...     ]
... )
>>> # instantiate logits processors
>>> logits_warper = LogitsProcessorList(
...     [
...         TopKLogitsWarper(50),
...         TemperatureLogitsWarper(0.7),
...     ]
... )
>>> stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
>>> torch.manual_seed(0)  
>>> outputs = model.sample(
...     input_ids,
...     logits_processor=logits_processor,
...     logits_warper=logits_warper,
...     stopping_criteria=stopping_criteria,
... )
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Today is a beautiful day, and a wonderful day.\n\nI was lucky enough to meet the']
```
class lm_polygraph.utils.ensemble_utils.ensemble_sample.SampleEncoderDecoderOutput(sequences: LongTensor | None = None, scores: Tuple[FloatTensor] | None = None, models_scores: Tuple[List[FloatTensor]] | None = None, models_beam_next_token_logits: Tuple[FloatTensor] | None = None, pe_uncertainties: Dict[str, List[FloatTensor]] | None = None, ep_uncertainties: Dict[str, List[FloatTensor]] | None = None, encoder_attentions: Tuple[FloatTensor] | None = None, encoder_hidden_states: Tuple[FloatTensor] | None = None, decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None, cross_attentions: Tuple[Tuple[FloatTensor]] | None = None, decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None)[source]

Bases: ModelOutput

Base class for outputs of encoder-decoder generation models using beam search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes)

Args:
sequences (torch.LongTensor of shape (batch_size*num_return_sequences, sequence_length)):

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

sequences_scores (torch.FloatTensor of shape (batch_size*num_return_sequences), optional, returned when output_scores=True is passed or when config.output_scores=True):

Final beam scores of the generated sequences.

scores (tuple(torch.FloatTensor) optional, returned when output_scores=True is passed or when config.output_scores=True):

Beam transition scores for each vocabulary token at each generation step. Beam transition scores consisting of log probabilities of tokens conditioned on log softmax of previously generated tokens in this beam. (max_length-1,)-shaped tuple of torch.FloatTensor with each tensor of shape (batch_size*num_beams, config.vocab_size)).

beam_indices (tuple(tuple(torch.LongTensor)), optional, returned when output_scores=True is passed or when config.output_scores=True):

Beam indices of generated token id at each generation step. torch.LongTensor of shape (batch_size*num_return_sequences, max_length-1).

attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True): encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer of the decoder) of shape (batch_size, num_heads, sequence_length, sequence_length).

encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size*num_beams*num_return_sequences, sequence_length, hidden_size).

decoder_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size*num_beams*num_return_sequences, num_heads, generated_length, sequence_length).

cross_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).

decoder_hidden_states (tuple(tuple(torch.FloatTensor)), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size*num_beams*num_return_sequences, generated_length, hidden_size).

cross_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None
encoder_attentions: Tuple[FloatTensor] | None = None
encoder_hidden_states: Tuple[FloatTensor] | None = None
ep_uncertainties: Dict[str, List[FloatTensor]] | None = None
models_beam_next_token_logits: Tuple[FloatTensor] | None = None
models_scores: Tuple[List[FloatTensor]] | None = None
pe_uncertainties: Dict[str, List[FloatTensor]] | None = None
scores: Tuple[FloatTensor] | None = None
sequences: LongTensor = None

Module contents