lm_polygraph.utils.ensemble_utils.ensemble_sample module

class lm_polygraph.utils.ensemble_utils.ensemble_sample.EnsembleSampleMixin[source]

Bases: GenerationMixin

sample(input_ids: LongTensor, logits_processor: LogitsProcessorList | None = None, stopping_criteria: StoppingCriteriaList | None = None, logits_warper: LogitsProcessorList | None = None, max_length: int | None = None, pad_token_id: int | None = None, eos_token_id: int | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, output_scores: bool | None = None, return_dict_in_generate: bool | None = None, synced_gpus: bool | None = False, streamer: BaseStreamer | None = None, **model_kwargs) GenerateDecoderOnlyOutput | GenerateEncoderDecoderOutput | LongTensor[source]

Generates sequences of token ids for models with a language modeling head using multinomial sampling and can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.

Parameters:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

The sequence used as a prompt for the generation.

logits_processor (LogitsProcessorList, optional):

An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.

stopping_criteria (StoppingCriteriaList, optional):

An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.

logits_warper (LogitsProcessorList, optional):

An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsWarper] used to warp the prediction score distribution of the language modeling head applied before multinomial sampling at each generation step.

max_length (int, optional, defaults to 20):

DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.

pad_token_id (int, optional):

The id of the padding token.

eos_token_id (int, optional):

The id of the end-of-sequence token.

output_attentions (bool, optional, defaults to False):

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.

output_hidden_states (bool, optional, defaults to False):

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.

output_scores (bool, optional, defaults to False):

Whether or not to return the prediction scores. See scores under returned tensors for more details.

return_dict_in_generate (bool, optional, defaults to False):

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

synced_gpus (bool, optional, defaults to False):

Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

model_kwargs:

Additional model specific kwargs will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Return:

[~generation_utils.SampleDecoderOnlyOutput], [~generation_utils.SampleEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation_utils.SampleDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation_utils.SampleEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

Examples:

```python >>> from transformers import ( … AutoTokenizer, … AutoModelForCausalLM, … LogitsProcessorList, … MinLengthLogitsProcessor, … TopKLogitsWarper, … TemperatureLogitsWarper, … StoppingCriteriaList, … MaxLengthCriteria, … ) >>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a EOS token
>>> model.config.pad_token_id = model.config.eos_token_id
>>> input_prompt = "Today is a beautiful day, and"
>>> input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids
>>> # instantiate logits processors
>>> logits_processor = LogitsProcessorList(
...     [
...         MinLengthLogitsProcessor(15, eos_token_id=model.config.eos_token_id),
...     ]
... )
>>> # instantiate logits processors
>>> logits_warper = LogitsProcessorList(
...     [
...         TopKLogitsWarper(50),
...         TemperatureLogitsWarper(0.7),
...     ]
... )
>>> stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
>>> torch.manual_seed(0)  
>>> outputs = model.sample(
...     input_ids,
...     logits_processor=logits_processor,
...     logits_warper=logits_warper,
...     stopping_criteria=stopping_criteria,
... )
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Today is a beautiful day, and a wonderful day.\n\nI was lucky enough to meet the']
```
class lm_polygraph.utils.ensemble_utils.ensemble_sample.SampleEncoderDecoderOutput(sequences: LongTensor = None, scores: Tuple[FloatTensor] | None = None, models_scores: Tuple[List[FloatTensor]] | None = None, models_beam_next_token_logits: Tuple[FloatTensor] | None = None, pe_uncertainties: Dict[str, List[FloatTensor]] | None = None, ep_uncertainties: Dict[str, List[FloatTensor]] | None = None, encoder_attentions: Tuple[FloatTensor] | None = None, encoder_hidden_states: Tuple[FloatTensor] | None = None, decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None, cross_attentions: Tuple[Tuple[FloatTensor]] | None = None, decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None)[source]

Bases: ModelOutput

Base class for outputs of encoder-decoder generation models using beam search. Hidden states and attention weights of the decoder (respectively the encoder) can be accessed via the encoder_attentions and the encoder_hidden_states attributes (respectively the decoder_attentions and the decoder_hidden_states attributes)

Args:
sequences (torch.LongTensor of shape (batch_size*num_return_sequences, sequence_length)):

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

sequences_scores (torch.FloatTensor of shape (batch_size*num_return_sequences), optional, returned when output_scores=True is passed or when config.output_scores=True):

Final beam scores of the generated sequences.

scores (tuple(torch.FloatTensor) optional, returned when output_scores=True is passed or when config.output_scores=True):

Beam transition scores for each vocabulary token at each generation step. Beam transition scores consisting of log probabilities of tokens conditioned on log softmax of previously generated tokens in this beam. (max_length-1,)-shaped tuple of torch.FloatTensor with each tensor of shape (batch_size*num_beams, config.vocab_size)).

beam_indices (tuple(tuple(torch.LongTensor)), optional, returned when output_scores=True is passed or when config.output_scores=True):

Beam indices of generated token id at each generation step. torch.LongTensor of shape (batch_size*num_return_sequences, max_length-1).

attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True): encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer of the decoder) of shape (batch_size, num_heads, sequence_length, sequence_length).

encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size*num_beams*num_return_sequences, sequence_length, hidden_size).

decoder_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size*num_beams*num_return_sequences, num_heads, generated_length, sequence_length).

cross_attentions (tuple(tuple(torch.FloatTensor)), optional, returned when output_attentions=True is passed or config.output_attentions=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size, num_heads, generated_length, sequence_length).

decoder_hidden_states (tuple(tuple(torch.FloatTensor)), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True):

Tuple (one element for each generated token) of tuples (one element for each layer of the decoder) of torch.FloatTensor of shape (batch_size*num_beams*num_return_sequences, generated_length, hidden_size).

cross_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_attentions: Tuple[Tuple[FloatTensor]] | None = None
decoder_hidden_states: Tuple[Tuple[FloatTensor]] | None = None
encoder_attentions: Tuple[FloatTensor] | None = None
encoder_hidden_states: Tuple[FloatTensor] | None = None
ep_uncertainties: Dict[str, List[FloatTensor]] | None = None
models_beam_next_token_logits: Tuple[FloatTensor] | None = None
models_scores: Tuple[List[FloatTensor]] | None = None
pe_uncertainties: Dict[str, List[FloatTensor]] | None = None
scores: Tuple[FloatTensor] | None = None
sequences: LongTensor = None