Skip to content

Stability Metrics

Stability metrics for monitoring population shifts and distribution changes over time.

stability

Population Stability Index helper functions.

This module provides convenient helper functions for population stability monitoring.

population_stability_index

population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    band_column: str,
    baseline_column: str,
    current_column: str,
    segment: list[str] | None = None,
    laplace_smoothing: bool = False,
) -> pl.DataFrame
population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    band_column: str,
    baseline_volume: str,
    current_volume: str,
    segment: list[str] | None = None,
    laplace_smoothing: bool = False,
) -> pl.DataFrame
population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Population Stability Index (psi) for record-level or summary-level data.

The Population Stability Index measures distributional stability of predicted probabilities or model scores over time by comparing the distribution across bands between baseline and current periods.

psi Formula: Σ (Current% - Baseline%) * ln(Current% / Baseline%)

psi Interpretation
  • psi < 0.1: Stable (no significant change)
  • 0.1 ≤ psi < 0.2: Moderate shift (monitor closely)
  • psi ≥ 0.2: Significant shift (investigate/retrain)

Record-level usage (data_format="record_level"): Required parameters: band_column, baseline_column, current_column

Summary-level usage (data_format="summary_level"): Required parameters: band_column, baseline_volume, current_volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the psi on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
laplace_smoothing

Whether to apply Laplace smoothing to avoid zero division errors.

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: band_column (str), baseline_column (str), current_column (str), segment (optional) For summary_level: band_column (str), baseline_volume (str), current_volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the psi result and associated metadata including:

DataFrame
  • baseline_volume: Total volume in baseline period
DataFrame
  • current_volume: Total volume in current period
DataFrame
  • volume: Total volume across both periods
DataFrame
  • bucket_count: Number of unique bands/buckets
DataFrame
  • psi: Population Stability Index value

Examples:

Record-level usage with period indicators:

# Combined dataset with period indicators
data = pl.DataFrame({
    "band": ["A", "A", "B", "B", "C", "C", "A", "A", "A", "B", "B", "C"],
    "is_baseline": [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],  # 1 = baseline
    "is_current": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],   # 1 = current
})

result = population_stability_index(
    name="model_stability_check",
    dataset=data,
    data_format="record_level",
    band_column="band",
    baseline_column="is_baseline",
    current_column="is_current"
)

Summary-level usage with explicit volumes:

# Pre-aggregated data with volumes by band
summary_data = pl.DataFrame({
    "band": ["A", "B", "C"],
    "baseline_volume": [1000, 1500, 500],    # Development period volumes
    "current_volume": [1200, 1200, 600]      # Current period volumes
})

result = population_stability_index(
    name="summary_stability_check",
    dataset=summary_data,
    data_format="summary_level",
    band_column="band",
    baseline_volume="baseline_volume",
    current_volume="current_volume"
)

With segmentation and laplace smoothing:

result = population_stability_index(
    name="segmented_stability",
    dataset=data,
    data_format="record_level",
    laplace_smoothing=True,
    band_column="band",
    baseline_column="is_baseline",
    current_column="is_current",
    segment=["product", "region"]
)

options: show_source: false heading_level: 2 members_order: source