Skip to content

Population Stability Index (PSI) Metric

The population_stability_index metric measures distribution shifts between baseline and current populations by comparing the proportion of observations across different bands or categories.

Metric Type: population_stability_index

PSI Calculation

PSI quantifies population drift by calculating:

PSI = Σ (Current% - Baseline%) × ln(Current% / Baseline%)

Where the sum is across all bands/categories with non-zero volumes. Common interpretation thresholds:

  • < 0.1: No significant population shift
  • 0.1 - 0.2: Minor population shift requiring monitoring
  • > 0.2: Major population shift requiring investigation

Zero-Volume Band Handling

The metric provides two approaches for handling bands with zero volume, null values, or NaN values in either the baseline or current period:

Default Approach: Filtering

By default (laplace_smoothing=False), bands with zero, null, or NaN volumes are automatically excluded from PSI calculations:

  • Filtering: Bands where baseline_volume = 0/null/NaN OR current_volume = 0/null/NaN are removed before percentage calculations
  • Clean Calculation: PSI is computed only on bands with positive volumes in both periods
  • Mathematical Safety: Eliminates divide-by-zero and log(0) errors without artificial smoothing

Laplace Smoothing Approach

When laplace_smoothing=True, the metric applies Laplace smoothing to include all bands in the PSI calculation:

  • Value Conversion: All null and NaN values are first converted to 0
  • Smoothing Application: Adds 1 to each band's volume (both baseline and current)
  • Inclusive Calculation: All bands are retained and contribute to the PSI calculation
  • Mathematical Stability: Prevents division by zero while preserving information about zero-volume bands

Laplace Smoothing Benefits

  • Captures Extreme Shifts: New bands (0 → positive) and disappeared bands (positive → 0) are included in PSI calculation
  • Conservative Smoothing: Adding 1 to each band has minimal impact on bands with large volumes while enabling calculation for zero-volume bands
  • Complete Picture: Provides a more comprehensive view of population shifts across all bands

When to Use Laplace Smoothing

  • Small Sample Sizes: When bands may legitimately have zero observations
  • New Segment Monitoring: When tracking emergence or disappearance of risk segments is critical
  • Conservative Risk Management: When you want to capture all potential distribution shifts
  • Regulatory Requirements: When complete population coverage is required for compliance

Important Considerations

  • Hidden Extreme Changes (without smoothing): Completely new bands (0 → positive) or disappeared bands (positive → 0) represent infinite PSI contributions but are excluded from the final metric
  • Potential Underestimation (without smoothing): The calculated PSI may underestimate true population shift when significant bands appear or disappear
  • Model Risk: New risk segments or complete segment disappearance may not be captured in the PSI value without smoothing

Alternative Approaches

If monitoring completely new or disappeared segments is critical for your use case, consider:

  • Enable Laplace smoothing with laplace_smoothing=True
  • Separate tracking of bands that appear/disappear between periods
  • Using band coverage metrics alongside PSI

Configuration Fields

Record-Level Data Format

For individual records with band assignments and time period indicators:

metrics:
  population_stability:
    metric_type: "population_stability_index"
    config:
      name: ["model_stability"]
      data_format: "record_level"
      band_column: "risk_band" # Column with band/category assignments (A, B, C, etc.)
      baseline_column: "is_baseline" # Binary indicator for baseline period (1/0)
      current_column: "is_current" # Binary indicator for current period (1/0)
      segment: [["model_version"]] # Optional: segmentation columns
      laplace_smoothing: false # Optional: enable Laplace smoothing for zero-volume bands
      dataset: "loan_portfolio"

Summary-Level Data Format

For pre-aggregated data with baseline and current counts by band:

metrics:
  summary_stability:
    metric_type: "population_stability_index"
    config:
      name: ["grade_stability"]
      data_format: "summary_level"
      band_column: "risk_grade" # Column with band/category identifiers
      baseline_volume: "baseline_count" # Column with baseline period counts
      current_volume: "current_count" # Column with current period counts
      segment: [["geography"]] # Optional: segmentation columns
      laplace_smoothing: false # Optional: enable Laplace smoothing for zero-volume bands
      dataset: "risk_grade_summary"

Required Columns

Record-Level Format

  • band_column: Categories or bands for grouping (string/categorical)
  • baseline_column: Binary indicator (1 for baseline period, 0 otherwise)
  • current_column: Binary indicator (1 for current period, 0 otherwise)
  • laplace_smoothing: Boolean flag to enable Laplace smoothing (optional, default: false)

Summary-Level Format

  • band_column: Categories or bands for grouping (string/categorical)
  • baseline_volume: Count of observations in baseline period (positive numbers, 0, null, or NaN; negative values not allowed)
  • current_volume: Count of observations in current period (positive numbers, 0, null, or NaN; negative values not allowed)
  • laplace_smoothing: Boolean flag to enable Laplace smoothing (optional, default: false)

Note: When laplace_smoothing=True, null and NaN values in volume columns are converted to 0, then 1 is added to all band volumes for mathematical stability.

Output Columns

The PSI metric returns the following columns:

  • group_key: Segmentation grouping (if segments specified)
  • baseline_volume: Total observations in baseline period
  • current_volume: Total observations in current period
  • volume: Total observations across both periods
  • bucket_count: Number of distinct bands/categories
  • psi: Population Stability Index value
  • curve_data: Plot data for PSI band analysis (struct array with band, baseline_pct, current_pct, and psi_component)

Curve Data for Visualization

The curve_data column contains band-level data for creating PSI distribution visualizations in BI tools like Power BI. Each data point contains:

  • band: Band/category identifier
  • baseline_pct: Percentage of baseline period observations in this band
  • current_pct: Percentage of current period observations in this band
  • psi_component: Individual band contribution to the overall PSI value

This data enables creation of:

  • Band Distribution Charts: Side-by-side comparison of baseline vs current percentages
  • PSI Component Analysis: Visualization of which bands contribute most to population shift
  • Heat Maps: Color-coded representation of distribution changes across bands
  • Waterfall Charts: Showing how each band contributes to the total PSI

Example Usage in BI Tools:

  1. Extract curve data from the PSI result
  2. Create bar charts with band as x-axis, baseline_pct and current_pct as dual y-axis series
  3. Use psi_component values for color coding or additional analysis
  4. Add reference lines or thresholds for PSI component significance

Interpretation Guidelines:

  • Bands with large differences between baseline_pct and current_pct indicate distribution shifts
  • Positive psi_component values indicate the current period has higher percentage than baseline
  • Negative psi_component values indicate the current period has lower percentage than baseline
  • Bands with zero volume in either period are excluded from the curve data

Example Usage

Monitoring Model Score Distribution Shifts

metrics:
  score_stability:
    metric_type: "population_stability_index"
    config:
      name: ["score_drift_monitoring"]
      data_format: "record_level"
      band_column: "score_decile"
      baseline_column: "is_baseline_month"
      current_column: "is_current_month"
      segment: [["product_line", "region"]]
      dataset: "monthly_scores"

Risk Grade Migration Analysis

metrics:
  grade_migration:
    metric_type: "population_stability_index"
    config:
      name: ["risk_grade_stability"]
      data_format: "summary_level"
      band_column: "internal_rating"
      baseline_volume: "q1_volume"
      current_volume: "q2_volume"
      segment: [["business_unit"]]
      dataset: "quarterly_ratings"

New Product Launch Monitoring with Laplace Smoothing

When monitoring population stability for new products where some risk bands may have zero observations in early periods:

metrics:
  new_product_stability:
    metric_type: "population_stability_index"
    config:
      name: ["launch_population_monitoring"]
      data_format: "summary_level"
      band_column: "credit_score_band"
      baseline_volume: "pilot_volume"
      current_volume: "launch_volume"
      laplace_smoothing: true # Enable to include zero-volume bands
      segment: [["launch_region"]]
      dataset: "product_launch_data"

Data Preparation Tips

  1. Band Creation: Ensure bands capture meaningful risk or characteristic segments
  2. Period Definition: Clearly define baseline and current periods with non-overlapping data
  3. Sample Size: Include sufficient observations in each band for reliable PSI calculation
  4. Missing Values: Handle missing bands consistently between periods
  5. Segmentation: Use segments to identify specific population subsets experiencing drift

Interpretation Guidelines

  • PSI < 0.1: Population is stable, no action required
  • PSI 0.1-0.2: Minor shift detected, monitor trends and investigate drivers
  • PSI > 0.2: Significant shift detected, review model performance and consider recalibration
  • Very High PSI (>0.5): Extreme shift, immediate investigation required

Interpreting Results with Default Filtering

  • Monitor Band Coverage: Check if the number of bands included in PSI calculation has decreased
  • Investigate Missing Bands: Separately analyze bands that appear or disappear between periods
  • Consider Context: A low PSI with many excluded bands may still indicate significant population changes

Interpreting Results with Laplace Smoothing

  • Comprehensive Coverage: All bands contribute to PSI calculation, including those with zero/null/NaN volumes
  • Conservative Estimates: PSI values may be slightly higher due to inclusion of zero-volume bands, providing a more conservative risk assessment
  • New Band Detection: Bands that appear or disappear between periods are captured in the PSI calculation
  • Volume Impact: The +1 smoothing has minimal impact on bands with large volumes but enables calculation for sparse bands

Fan-Out Expansion

The Population Stability Index metric supports fan-out expansion when used in workflows. Each combination of segment values will generate a separate PSI calculation, enabling detailed stability monitoring across different population subsets.