Basic usage
Installation
From GitHub
To install latest from main brach, clone the repo and conduct installation using pip, it is recommended to use virtual environment. Code example is presented below:
$ git clone https://github.com/IINemo/lm-polygraph.git
$ python3 -m venv env
$ source env/bin/activate
(env) $ cd lm-polygraph
(env) $ pip install .
Installation from GitHub is recommended if you want to explore notebooks with examples or use default benchmarking configurations, as they are included in the repository but not in the PyPI package. However code from the main branch may be unstable, so it is recommended to checkout to the latest stable release before installation:
$ git clone https://github.com/IINemo/lm-polygraph.git
$ git checkout tags/v0.3.0
$ python3 -m venv env
$ source env/bin/activate
(env) $ cd lm-polygraph
(env) $ pip install .
From PyPI
To install the latest stable version from PyPI, run:
$ python3 -m venv env
$ source env/bin/activate
$ pip install lm-polygraph
To install a specific version, run:
$ python3 -m venv env
$ source env/bin/activate
$ pip install lm-polygraph==0.3.0
Uncertainty for single input
Initialize the base model (encoder-decoder or decoder-only) and tokenizer from HuggingFace or a local file, and use them to initialize the
WhiteboxModelfor evaluation. For example, withbigscience/bloomz-560m:from transformers import AutoModelForCausalLM, AutoTokenizer from lm_polygraph.utils.model import WhiteboxModel model_path = "bigscience/bloomz-560m" base_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0") tokenizer = AutoTokenizer.from_pretrained(model_path) model = WhiteboxModel(base_model, tokenizer, model_path=model_path)
Alternatively, you can use
WhiteboxModel#from_pretrainedmethod to let LM-Polygraph download the model and tokenizer for you. However, this approach is deprecated and will be removed in the next major release.from lm_polygraph.utils.model import WhiteboxModel model = WhiteboxModel.from_pretrained( "bigscience/bloomz-3b", device_map="cuda:0", )
Specify UE method:
from lm_polygraph.estimators import * ue_method = MeanPointwiseMutualInformation()
Get predictions and their uncertainty scores:
from lm_polygraph.utils.manager import estimate_uncertainty input_text = "Who is George Bush?" ue = estimate_uncertainty(model, ue_method, input_text=input_text) print(ue) # UncertaintyOutput(uncertainty=-6.504108926902215, input_text='Who is George Bush?', generation_text=' President of the United States', model_path='bigscience/bloomz-560m')
More examples for obtaining uncertainty for single generation: examples/basic_example.ipynb
Benchmarking uncertainty estimation on a dataset
CLI
Recommended way of running benchmarks is by invoking the polygraph_eval script. Configuration for experiments is done via Hydra YAML config files.
Basic evaluation is invoked like so:
$ HYDRA_CONFIG=/absolute/path/to/config.yaml polygraph_eval
As usual with Hydra, you can override any parameter from the config file by specifying it in the command line. For example, to override the batch size:
$ HYDRA_CONFIG=/absolute/path/to/config.yaml polygraph_eval --batch_size=4
Examples of configuration files for several widely used datasets can be found in the examples/configs directory of the repository.
The results of evaluation will be saved as a serialized UEManager object to the directory specified by save_path in the config file. Refer to UE Manager for more information about the structure of the UEManager object.
To visualize benchmarking results, use and adapt for your case the notebooks/visualization_tables.ipynb notebook.
Python
It is also possible to run benchmarks from Python code. Examples of how to do this can be found in the following notebooks:
examples for the QA task with bigscience/bloomz-3b on the TriviaQA dataset:
examples/qa_example.ipynbexamples for the NMT task with facebook/wmt19-en-de on the WMT14 En-De dataset:
examples/mt_example.ipynbexamples for the ATS task with facebook/bart-large-cnn model on the XSUM dataset:
examples/ats_example.ipynb
To run more elaborate benchmarks directly from python, refer to the source code of polygraph_eval script.