lm_polygraph.utils.ensemble_utils.ensemble_greedy module

class lm_polygraph.utils.ensemble_utils.ensemble_greedy.EnsembleGreedyMixin[source]

Bases: GenerationMixin

Generates sequences of token ids for models with a language modeling head using greedy decoding and can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.

<Tip warning={true}>

In most cases, you do not need to call [~generation.GenerationMixin.greedy_search] directly. Use generate() instead. For an overview of generation strategies and code examples, check the [following guide](../generation_strategies).

</Tip>

Parameters:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

The sequence used as a prompt for the generation.

logits_processor (LogitsProcessorList, optional):

An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.

stopping_criteria (StoppingCriteriaList, optional):

An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.

max_length (int, optional, defaults to 20):

DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.

pad_token_id (int, optional):

The id of the padding token.

eos_token_id (Union[int, List[int]], optional):

The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.

output_attentions (bool, optional, defaults to False):

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.

output_hidden_states (bool, optional, defaults to False):

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.

output_scores (bool, optional, defaults to False):

Whether or not to return the prediction scores. See scores under returned tensors for more details.

return_dict_in_generate (bool, optional, defaults to False):

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

synced_gpus (bool, optional, defaults to False):

Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

streamer (BaseStreamer, optional):

Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.

model_kwargs:

Additional model specific keyword arguments will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Return:

[~generation.GreedySearchDecoderOnlyOutput], [~generation.GreedySearchEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation.GreedySearchDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation.GreedySearchEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

Examples:

```python >>> from transformers import ( … AutoTokenizer, … AutoModelForCausalLM, … LogitsProcessorList, … MinLengthLogitsProcessor, … StoppingCriteriaList, … MaxLengthCriteria, … )

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a PAD token
>>> model.generation_config.pad_token_id = model.generation_config.eos_token_id
>>> input_prompt = "It might be possible to"
>>> input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids
>>> # instantiate logits processors
>>> logits_processor = LogitsProcessorList(
...     [
...         MinLengthLogitsProcessor(10, eos_token_id=model.generation_config.eos_token_id),
...     ]
... )
>>> stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
>>> outputs = model.greedy_search(
...     input_ids, logits_processor=logits_processor, stopping_criteria=stopping_criteria
... )
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
["It might be possible to get a better understanding of the nature of the problem, but it's not"]
```
class lm_polygraph.utils.ensemble_utils.ensemble_greedy.GreedySearchEncoderDecoderOutput(sequences: LongTensor = None, sequences_scores: FloatTensor | None = None, scores: Tuple[FloatTensor] | None = None, models_scores: Tuple[List[FloatTensor]] | None = None, models_hypo_next_token_logits: Tuple[FloatTensor] | None = None, pe_uncertainties: Dict[str, List[FloatTensor]] | None = None, ep_uncertainties: Dict[str, List[FloatTensor]] | None = None, encoder_attentions: Tuple[FloatTensor] | None = None, encoder_hidden_states: Tuple[FloatTensor] | None = None, decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None, cross_attentions: Tuple[Tuple[FloatTensor]] | None = None, decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None)[source]

Bases: ModelOutput

Base class for outputs of encoder-decoder generation models using greedy search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes)

Args:
sequences (torch.LongTensor of shape (batch_size, sequence_length)):

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

scores (tuple(torch.FloatTensor) optional, returned when output_scores=True is passed or when config.output_scores=True):

Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of torch.FloatTensor with up to max_new_tokens elements (one element for each generated token), with each tensor of shape (batch_size, config.vocab_size).

encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer of the decoder) of shape (batch_size, num_heads, sequence_length, sequence_length).

encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

decoder_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).

cross_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).

decoder_hidden_states (tuple(tuple(torch.FloatTensor)), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, generated_length, hidden_size).

cross_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None
encoder_attentions: Tuple[FloatTensor] | None = None
encoder_hidden_states: Tuple[FloatTensor] | None = None
ep_uncertainties: Dict[str, List[FloatTensor]] | None = None
models_hypo_next_token_logits: Tuple[FloatTensor] | None = None
models_scores: Tuple[List[FloatTensor]] | None = None
pe_uncertainties: Dict[str, List[FloatTensor]] | None = None
scores: Tuple[FloatTensor] | None = None
sequences: LongTensor = None
sequences_scores: FloatTensor | None = None