LM-Polygraph Normalization Methods
LM-Polygraph implements several uncertainty normalization methods to convert raw uncertainty scores into more interpretable confidence values bounded between 0 and 1. Here are the key normalization approaches:
MinMax Normalization (MinMaxNormalizer in minmax.py)
Takes raw uncertainty scores and linearly scales them to [0,1] range.
Flips the sign since uncertainty scores should be negatively correlated with confidence.
Uses scikit-learn’s
MinMaxScalerinternally.Simple but doesn’t maintain a direct connection to output quality.
Quantile Normalization (QuantileNormalizer in quantile.py)
Transforms uncertainty scores into their corresponding percentile ranks.
Uses empirical CDF to map scores to [0,1] range.
Provides uniformly distributed confidence scores.
May lose some granularity of original uncertainty estimates.
Performance-Calibrated Confidence (PCC) Methods
Binned PCC (BinnedPCCNormalizer in binned_pcc.py)
Splits calibration data into bins based on uncertainty values.
Each bin has approximately an equal number of samples.
Confidence score is the mean output quality of samples in the corresponding bin.
Provides an interpretable connection between confidence and expected quality.
Drawback: Can change ordering of samples compared to raw uncertainty.
Isotonic PCC (IsotonicPCCNormalizer in isotonic_pcc.py)
Uses Centered Isotonic Regression (CIR) to fit a monotonic relationship.
Maps uncertainty scores to output quality while preserving ordering.
Enforces monotonicity constraint to maintain uncertainty ranking.
More robust than the binned approach while maintaining interpretability.
Implementation based on CIR algorithm from Oron & Flournoy (2017).
Common Interface: BaseUENormalizer
All normalizers follow a common interface defined in BaseUENormalizer:
fit(): Learns normalization parameters from calibration data.transform(): Applies normalization to new uncertainty scores.dumps()/loads(): Serialization support for fitted normalizers.
Key Benefits of PCC Methods
Direct connection to output quality metrics.
Bounded interpretable range [0,1].
Maintained correlation with generation quality.
Easy to explain meaning to end users.
Highlight: Isotonic PCC
The Isotonic PCC approach provides the best balance between:
Maintaining the original uncertainty ranking.
Providing interpretable confidence scores.
Establishing a clear connection to expected output quality.
When using normalized scores, users can interpret them as estimates of relative output quality, making them more useful for downstream applications and human understanding.