Stability Metrics¶
Stability metrics for monitoring population shifts and distribution changes over time.
stability ¶
Population Stability Index helper functions.
This module provides convenient helper functions for population stability monitoring.
population_stability_index ¶
population_stability_index(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the Population Stability Index (psi) for record-level or summary-level data.
The Population Stability Index measures distributional stability of predicted probabilities or model scores over time by comparing the distribution across bands between baseline and current periods.
psi Formula: Σ (Current% - Baseline%) * ln(Current% / Baseline%)
psi Interpretation
- psi < 0.1: Stable (no significant change)
- 0.1 ≤ psi < 0.2: Moderate shift (monitor closely)
- psi ≥ 0.2: Significant shift (investigate/retrain)
Record-level usage (data_format="record_level"): Required parameters: band_column, baseline_column, current_column
Summary-level usage (data_format="summary_level"): Required parameters: band_column, baseline_volume, current_volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the psi on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
laplace_smoothing
|
Whether to apply Laplace smoothing to avoid zero division errors. |
required | |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: band_column (str), baseline_column (str), current_column (str), segment (optional) For summary_level: band_column (str), baseline_volume (str), current_volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the psi result and associated metadata including: |
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
Examples:
Record-level usage with period indicators:
# Combined dataset with period indicators
data = pl.DataFrame({
"band": ["A", "A", "B", "B", "C", "C", "A", "A", "A", "B", "B", "C"],
"is_baseline": [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0], # 1 = baseline
"is_current": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], # 1 = current
})
result = population_stability_index(
name="model_stability_check",
dataset=data,
data_format="record_level",
band_column="band",
baseline_column="is_baseline",
current_column="is_current"
)
Summary-level usage with explicit volumes:
# Pre-aggregated data with volumes by band
summary_data = pl.DataFrame({
"band": ["A", "B", "C"],
"baseline_volume": [1000, 1500, 500], # Development period volumes
"current_volume": [1200, 1200, 600] # Current period volumes
})
result = population_stability_index(
name="summary_stability_check",
dataset=summary_data,
data_format="summary_level",
band_column="band",
baseline_volume="baseline_volume",
current_volume="current_volume"
)
With segmentation and laplace smoothing:
options: show_source: false heading_level: 2 members_order: source