lm_polygraph.utils.ensemble_utils.ensemble_greedy module
- class lm_polygraph.utils.ensemble_utils.ensemble_greedy.EnsembleGreedyMixin[source]
Bases:
GenerationMixin- greedy_search(input_ids: LongTensor, logits_processor: LogitsProcessorList | None = None, stopping_criteria: StoppingCriteriaList | None = None, max_length: int | None = None, pad_token_id: int | None = None, eos_token_id: int | List[int] | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, output_scores: bool | None = None, return_dict_in_generate: bool | None = None, synced_gpus: bool = False, streamer: BaseStreamer | None = None, **model_kwargs) GenerateDecoderOnlyOutput | GenerateEncoderDecoderOutput | LongTensor[source]
Generates sequences of token ids for models with a language modeling head using greedy decoding and can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.
<Tip warning={true}>
In most cases, you do not need to call [~generation.GenerationMixin.greedy_search] directly. Use generate() instead. For an overview of generation strategies and code examples, check the [following guide](../generation_strategies).
</Tip>
- Parameters:
- input_ids (torch.LongTensor of shape (batch_size, sequence_length)):
The sequence used as a prompt for the generation.
- logits_processor (LogitsProcessorList, optional):
An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.
- stopping_criteria (StoppingCriteriaList, optional):
An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.
- max_length (int, optional, defaults to 20):
DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.
- pad_token_id (int, optional):
The id of the padding token.
- eos_token_id (Union[int, List[int]], optional):
The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.
- output_attentions (bool, optional, defaults to False):
Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.
- output_hidden_states (bool, optional, defaults to False):
Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.
- output_scores (bool, optional, defaults to False):
Whether or not to return the prediction scores. See scores under returned tensors for more details.
- return_dict_in_generate (bool, optional, defaults to False):
Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.
- synced_gpus (bool, optional, defaults to False):
Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
- streamer (BaseStreamer, optional):
Streamer object that will be used to stream the generated sequences. Generated tokens are passed through streamer.put(token_ids) and the streamer is responsible for any further processing.
- model_kwargs:
Additional model specific keyword arguments will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.
- Return:
[~generation.GreedySearchDecoderOnlyOutput], [~generation.GreedySearchEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation.GreedySearchDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation.GreedySearchEncoderDecoderOutput] if model.config.is_encoder_decoder=True.
Examples:
```python >>> from transformers import ( … AutoTokenizer, … AutoModelForCausalLM, … LogitsProcessorList, … MinLengthLogitsProcessor, … StoppingCriteriaList, … MaxLengthCriteria, … )
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2") >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a PAD token >>> model.generation_config.pad_token_id = model.generation_config.eos_token_id
>>> input_prompt = "It might be possible to" >>> input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids
>>> # instantiate logits processors >>> logits_processor = LogitsProcessorList( ... [ ... MinLengthLogitsProcessor(10, eos_token_id=model.generation_config.eos_token_id), ... ] ... ) >>> stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
>>> outputs = model.greedy_search( ... input_ids, logits_processor=logits_processor, stopping_criteria=stopping_criteria ... )
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True) ["It might be possible to get a better understanding of the nature of the problem, but it's not"] ```
- class lm_polygraph.utils.ensemble_utils.ensemble_greedy.GreedySearchEncoderDecoderOutput(sequences: LongTensor = None, sequences_scores: FloatTensor | None = None, scores: Tuple[FloatTensor] | None = None, models_scores: Tuple[List[FloatTensor]] | None = None, models_hypo_next_token_logits: Tuple[FloatTensor] | None = None, pe_uncertainties: Dict[str, List[FloatTensor]] | None = None, ep_uncertainties: Dict[str, List[FloatTensor]] | None = None, encoder_attentions: Tuple[FloatTensor] | None = None, encoder_hidden_states: Tuple[FloatTensor] | None = None, decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None, cross_attentions: Tuple[Tuple[FloatTensor]] | None = None, decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None)[source]
Bases:
ModelOutputBase class for outputs of encoder-decoder generation models using greedy search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes)
- Args:
- sequences (torch.LongTensor of shape (batch_size, sequence_length)):
The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.
- scores (tuple(torch.FloatTensor) optional, returned when output_scores=True is passed or when config.output_scores=True):
Processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of torch.FloatTensor with up to max_new_tokens elements (one element for each generated token), with each tensor of shape (batch_size, config.vocab_size).
- encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or config.output_attentions=True):
Tuple of torch.FloatTensor (one for each layer of the decoder) of shape (batch_size, num_heads, sequence_length, sequence_length).
- encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):
Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
- decoder_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):
Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).
- cross_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):
Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).
- decoder_hidden_states (tuple(tuple(torch.FloatTensor)), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):
Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, generated_length, hidden_size).
- cross_attentions: Tuple[Tuple[FloatTensor]] | None = None
- decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None
- encoder_attentions: Tuple[FloatTensor] | None = None
- ep_uncertainties: Dict[str, List[FloatTensor]] | None = None
- models_hypo_next_token_logits: Tuple[FloatTensor] | None = None
- models_scores: Tuple[List[FloatTensor]] | None = None
- pe_uncertainties: Dict[str, List[FloatTensor]] | None = None
- scores: Tuple[FloatTensor] | None = None
- sequences: LongTensor = None
- sequences_scores: FloatTensor | None = None