Core Normalization Configuration

Overview

Core normalization configuration in LM-Polygraph defines how uncertainty scores are transformed into interpretable confidence values. These configurations control the fundamental behavior of all normalization methods across the system.

Base Configuration Location

Core normalization configurations are located in:

/examples/configs/normalization/fit/default.yaml

Available Normalization Methods

1. MinMax Normalization

Linearly scales uncertainty scores to [0,1] range.

normalization:
  type: "minmax"
  clip: true  # Whether to clip values outside [0,1] range

2. Quantile Normalization

Transforms scores into percentile ranks using empirical CDF.

normalization:
  type: "quantile"

3. Binned Performance-Calibrated Confidence (Binned PCC)

Maps uncertainty scores to confidence bins based on output quality.

normalization:
  type: "binned_pcc"
  params:
    num_bins: 10  # Number of bins for mapping

4. Isotonic Performance-Calibrated Confidence (Isotonic PCC)

Uses monotonic regression to map uncertainty to confidence while preserving ordering.

normalization:
  type: "isotonic_pcc"
  params:
    y_min: 0.0  # Minimum confidence value
    y_max: 1.0  # Maximum confidence value
    increasing: false  # Whether mapping should be increasing
    out_of_bounds: "clip"  # How to handle out-of-range values

Common Parameters

Calibration Strategy

normalization:
  calibration:
    strategy: "dataset_specific"  # or "global"
    background_dataset: null  # Optional background dataset for global calibration

Data Processing

normalization:
  processing:
    ignore_nans: true  # Whether to ignore NaN values in calibration
    normalize_metrics: true  # Whether to normalize quality metrics

Caching

normalization:
  cache:
    enabled: true
    path: "${cache_path}/normalization"
    version: "v1"

Usage Examples

Basic MinMax Normalization

normalization:
  type: "minmax"
  clip: true
  calibration:
    strategy: "dataset_specific"

Global Isotonic PCC

normalization:
  type: "isotonic_pcc"
  params:
    y_min: 0.0
    y_max: 1.0
    increasing: false
  calibration:
    strategy: "global"
    background_dataset: "allenai/c4"

Binned PCC with Custom Settings

normalization:
  type: "binned_pcc"
  params:
    num_bins: 20
  processing:
    ignore_nans: false
    normalize_metrics: true
  cache:
    enabled: true

Best Practices

1. Method Selection

  • Use MinMax/Quantile for simple scaling needs

  • Use PCC methods when interpretability is crucial

  • Prefer Isotonic PCC when preserving score ordering is important

2. Calibration Strategy

  • Use dataset-specific calibration when possible

  • Use global calibration when consistency across tasks is needed

  • Consider using background dataset for robust global calibration

3. Performance Considerations

  • Enable caching for large datasets

  • Adjust bin count based on dataset size

  • Monitor memory usage with large calibration sets

Integration with Other Configs

Core normalization settings can be overridden by:

  • Task-specific configs

  • Model-specific configs

  • Instruction-tuned model configs

Core settings serve as defaults when not specified in other configuration layers.