Public API Reference¶
This section documents the public interface for the TNP Statistic Library.
Helper Functions Interface¶
The primary way to use the library is through the convenient helper functions that provide a simple, type-safe interface for calculating statistical metrics.
All functions support both record-level and summary-level data formats with automatic validation and optimization.
Organized by Category¶
- Accuracy Metrics - Default accuracy, EAD accuracy, Hosmer-Lemeshow test, Jeffreys test
- Discrimination Metrics - AUC (Area Under Curve), Gini coefficient, Kolmogorov-Smirnov test, F1 score, F2 score
- Normality Testing - Shapiro-Wilk test for distribution normality assessment
- Summary Statistics - Mean, median calculations
Complete Function Reference¶
metrics ¶
Metrics package - Public helper functions for statistical calculations.
This package provides the main public interface for calculating statistical metrics. All metric classes are internal implementations and should not be used directly.
Example usage
binomial_test ¶
binomial_test(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate binomial test for record-level or summary-level data.
The binomial test is used to test whether an observed proportion of defaults significantly differs from an expected probability under the null hypothesis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name identifier for the metric calculation. |
required |
dataset
|
LazyFrame | DataFrame
|
The input data as a Polars LazyFrame or DataFrame. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data, either "record_level" or "summary_level". |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. |
{}
|
Record-level format kwargs
default: Column name containing binary default indicators (0/1 or boolean). expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.
Summary-level format kwargs
volume: Column name containing the total number of observations. defaults: Column name containing the number of defaults. expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pl.DataFrame: A DataFrame containing binomial test results with columns: - group_key: Grouping information (struct of segment columns) - volume: Total number of observations - defaults: Number of observed defaults - observed_probability: Observed default rate - expected_probability: Expected default rate under null hypothesis - p_value: Two-tailed p-value from binomial test |
Examples:
Record-level data:
binomial_test(
name="default_rate_test",
dataset=data,
data_format="record_level",
default="default_flag",
expected_probability=0.05
)
Summary-level data:
default_accuracy ¶
default_accuracy(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate default accuracy for record-level or summary-level data.
Record-level usage (data_format="record_level"): Required parameters: prob_def, default
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the default accuracy on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing default accuracy metrics for each group. |
Examples:
Record-level usage:
result = default_accuracy(
name="model_accuracy",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag"
)
Summary-level usage:
ead_accuracy ¶
ead_accuracy(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
predicted_ead: str,
actual_ead: str,
**kwargs: Any,
) -> pl.DataFrame
Calculate EAD accuracy for record-level or summary-level data.
Record-level usage (data_format="record_level"): Required parameters: default
Summary-level usage (data_format="summary_level"): Required parameters: defaults, volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the EAD accuracy on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
predicted_ead
|
str
|
Column containing predicted EAD values. |
required |
actual_ead
|
str
|
Column containing actual EAD values. |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: default (str), segment (optional) For summary_level: defaults (str), volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing EAD accuracy metrics for each group. |
Examples:
Record-level usage:
result = ead_accuracy(
name="ead_model_accuracy",
dataset=df,
data_format="record_level",
predicted_ead="predicted_ead",
actual_ead="actual_ead",
default="default_flag"
)
Summary-level usage:
hosmer_lemeshow ¶
hosmer_lemeshow(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the Hosmer-Lemeshow metric for record-level or summary-level data.
Record-level usage (data_format="record_level"): Required parameters: prob_def, default
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the Hosmer-Lemeshow test on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), bands (int, default=10), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), bands (int, default=10), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the Hosmer-Lemeshow test result and associated metadata. |
Examples:
Record-level usage:
result = hosmer_lemeshow(
name="hl_test",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag",
bands=10
)
Summary-level usage:
jeffreys_test ¶
jeffreys_test(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the Jeffreys test metric for record-level or summary-level data.
Record-level usage (data_format="record_level"): Required parameters: prob_def, default
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the Jeffreys test on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the Jeffreys test result and associated metadata. |
Examples:
Record-level usage:
result = jeffreys_test(
name="jeffreys_test",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag"
)
Summary-level usage:
mape ¶
mape(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate Mean Absolute Percentage Error (MAPE) for record-level or summary-level data.
Record-level usage (data_format="record_level"): Required parameters: observed, predicted
Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_absolute_percentage_errors
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the MAPE on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_absolute_percentage_errors (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing MAPE metrics for each group. |
Examples:
Record-level usage:
result = mape(
name="model_mape",
dataset=df,
data_format="record_level",
observed="observed_values",
predicted="predicted_values"
)
Summary-level usage:
rmse ¶
rmse(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate Root Mean Squared Error (RMSE) for record-level or summary-level data.
Record-level usage (data_format="record_level"): Required parameters: observed, predicted
Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_squared_errors
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the RMSE on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_squared_errors (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing RMSE metrics for each group. |
Examples:
Record-level usage:
result = rmse(
name="model_rmse",
dataset=df,
data_format="record_level",
observed="observed_values",
predicted="predicted_values"
)
Summary-level usage:
ttest ¶
ttest(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate T-test statistics for record-level or summary-level data.
Performs a one-sample t-test to determine if the mean difference between observed and predicted values is significantly different from a null hypothesis mean.
Record-level usage (data_format="record_level"): Required parameters: observed, predicted Optional parameters: null_hypothesis_mean (default: 0.0)
Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_differences, sum_squared_differences Optional parameters: null_hypothesis_mean (default: 0.0)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the T-test on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), null_hypothesis_mean (float), segment (optional) For summary_level: volume (str), sum_differences (str), sum_squared_differences (str), null_hypothesis_mean (float), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing T-test statistics for each group. |
Examples:
Record-level usage:
result = ttest(
name="model_ttest",
dataset=df,
data_format="record_level",
observed="observed_values",
predicted="predicted_values"
)
Summary-level usage:
auc ¶
auc(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the Area Under the ROC Curve (auc) for record-level or summary-level data.
Record-level usage (data_format="record_level"): Required parameters: prob_def, default
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the auc on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the auc result and associated metadata. |
Examples:
Record-level usage:
result = auc(
name="model_auc",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag"
)
Summary-level usage:
f1_score ¶
f1_score(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the F1 score for record-level or summary-level data.
The F1 score is the harmonic mean of precision and recall, providing a balanced measure of classification performance.
Record-level usage (data_format="record_level"): Required parameters: prob_def, default Optional parameters: threshold (default 0.5)
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume Optional parameters: threshold (default 0.5)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the F1 score on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), threshold (float), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), threshold (float), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the F1 score and associated metrics for each group. |
Examples:
Record-level usage:
result = f1_score(
name="model_f1",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag",
threshold=0.6
)
Summary-level usage:
f2_score ¶
f2_score(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the F2 score for record-level or summary-level data.
The F2 score weights recall higher than precision, making it suitable for scenarios where missing positive cases (false negatives) is more costly than false positives.
Record-level usage (data_format="record_level"): Required parameters: prob_def, default Optional parameters: threshold (default 0.5)
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume Optional parameters: threshold (default 0.5)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the F2 score on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), threshold (float), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), threshold (float), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the F2 score and associated metrics for each group. |
Examples:
Record-level usage:
result = f2_score(
name="model_f2",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag",
threshold=0.3
)
Summary-level usage:
gini ¶
gini(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the Gini coefficient for record-level or summary-level data.
The Gini coefficient is calculated as 2*AUC - 1, where AUC is the Area Under the ROC Curve. It ranges from -1 to 1, where: - 1 indicates perfect discrimination - 0 indicates no discrimination (random) - -1 indicates perfectly inverse discrimination
Record-level usage (data_format="record_level"): Required parameters: prob_def, default
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the Gini coefficient on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the Gini coefficient result and associated metadata. |
Examples:
Record-level usage:
result = gini(
name="model_gini",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag"
)
Summary-level usage:
kolmogorov_smirnov ¶
kolmogorov_smirnov(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the Kolmogorov-Smirnov statistic for record-level or summary-level data.
The Kolmogorov-Smirnov statistic measures the maximum difference between the cumulative distribution functions of predicted scores for defaulters vs non-defaulters. It ranges from 0 to 1, where higher values indicate better discrimination.
Record-level usage (data_format="record_level"): Required parameters: prob_def, default
Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the KS statistic on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the KS statistic, p-value, and associated metadata. |
Examples:
Record-level usage:
result = kolmogorov_smirnov(
name="model_ks",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag"
)
Summary-level usage:
shapiro_wilk ¶
shapiro_wilk(
*,
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
segment: SegmentCol = None,
**kwargs,
) -> ShapiroWilk
Compute the Shapiro-Wilk test for normality.
The Shapiro-Wilk test is a statistical test to assess whether a dataset follows a normal distribution. It is considered one of the most powerful normality tests, especially for small to medium sample sizes.
The test returns: - statistic: The test statistic (W), ranges from 0 to 1 - p_value: The p-value for the test - volume: The number of observations used in the test
Interpretation guidelines: - The null hypothesis (H0) assumes the data follows a normal distribution - The alternative hypothesis (H1) assumes the data does not follow a normal distribution - Compare p_value to your chosen significance level (alpha): * If p_value < alpha: Evidence against normality (reject H0) * If p_value >= alpha: Insufficient evidence against normality (fail to reject H0) - Common alpha values: 0.05 (5%), 0.01 (1%), or 0.10 (10%)
Limitations: - Requires at least 3 observations - Maximum sample size is 5000 (scipy limitation) - Sensitive to outliers and ties in the data
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name identifier for this metric instance. |
required |
dataset
|
LazyFrame | DataFrame
|
The input dataset as either a LazyFrame or DataFrame. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
The format of the input data. |
required |
segment
|
SegmentCol
|
Optional list of column names to use for segmentation/grouping. |
None
|
**kwargs
|
Additional arguments based on data_format. |
{}
|
Record-level format args
data_column: The column containing the data to test for normality.
Summary-level format args
volume: The column containing the count of observations. statistic: The column containing pre-computed Shapiro-Wilk statistics. p_value: The column containing pre-computed p-values.
Returns:
| Type | Description |
|---|---|
ShapiroWilk
|
A ShapiroWilk metric instance ready for computation. |
Examples:
Record-level usage:
>>> import polars as pl
>>> from tnp_statistic_library.metrics.normality import shapiro_wilk
>>>
>>> # Create sample data
>>> df = pl.DataFrame({
... "values": [1.2, 1.1, 1.3, 1.0, 1.4, 1.2, 1.1, 1.5, 1.3, 1.2],
... "group": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"]
... })
>>>
>>> # Test normality for each group
>>> metric = shapiro_wilk(
... name="data_normality",
... dataset=df,
... data_format="record_level",
... data_column="values",
... segment=["group"]
... )
>>> result = metric.run_metric().collect()
Summary-level usage:
>>> df_summary = pl.DataFrame({
... "volume": [50, 45],
... "statistic": [0.95, 0.92],
... "p_value": [0.06, 0.03],
... "region": ["North", "South"]
... })
>>>
>>> metric = shapiro_wilk(
... name="regional_normality",
... dataset=df_summary,
... data_format="summary_level",
... volume="volume",
... statistic="statistic",
... p_value="p_value",
... segment=["region"]
... )
>>> result = metric.run_metric().collect()
population_stability_index ¶
population_stability_index(
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
**kwargs: Any,
) -> pl.DataFrame
Calculate the Population Stability Index (psi) for record-level or summary-level data.
The Population Stability Index measures distributional stability of predicted probabilities or model scores over time by comparing the distribution across bands between baseline and current periods.
psi Formula: Σ (Current% - Baseline%) * ln(Current% / Baseline%)
psi Interpretation
- psi < 0.1: Stable (no significant change)
- 0.1 ≤ psi < 0.2: Moderate shift (monitor closely)
- psi ≥ 0.2: Significant shift (investigate/retrain)
Record-level usage (data_format="record_level"): Required parameters: band_column, baseline_column, current_column
Summary-level usage (data_format="summary_level"): Required parameters: band_column, baseline_volume, current_volume
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the psi on. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
Format of the input data ("record_level" or "summary_level"). |
required |
laplace_smoothing
|
Whether to apply Laplace smoothing to avoid zero division errors. |
required | |
**kwargs
|
Any
|
Additional keyword arguments specific to the data format. For record_level: band_column (str), baseline_column (str), current_column (str), segment (optional) For summary_level: band_column (str), baseline_volume (str), current_volume (str), segment (optional) |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the psi result and associated metadata including: |
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
Examples:
Record-level usage with period indicators:
# Combined dataset with period indicators
data = pl.DataFrame({
"band": ["A", "A", "B", "B", "C", "C", "A", "A", "A", "B", "B", "C"],
"is_baseline": [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0], # 1 = baseline
"is_current": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], # 1 = current
})
result = population_stability_index(
name="model_stability_check",
dataset=data,
data_format="record_level",
band_column="band",
baseline_column="is_baseline",
current_column="is_current"
)
Summary-level usage with explicit volumes:
# Pre-aggregated data with volumes by band
summary_data = pl.DataFrame({
"band": ["A", "B", "C"],
"baseline_volume": [1000, 1500, 500], # Development period volumes
"current_volume": [1200, 1200, 600] # Current period volumes
})
result = population_stability_index(
name="summary_stability_check",
dataset=summary_data,
data_format="summary_level",
band_column="band",
baseline_volume="baseline_volume",
current_volume="current_volume"
)
With segmentation and laplace smoothing:
mean ¶
mean(
name: str,
dataset: LazyFrame | DataFrame,
variable: str,
segment: list[str] | None = None,
) -> pl.DataFrame
Calculate the mean summary for the given dataset and parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the mean on. |
required |
variable
|
str
|
Column name for which to compute the mean. |
required |
segment
|
list[str] | None
|
Segmentation groups for calculation. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the mean summary and associated metadata. |
median ¶
median(
name: str,
dataset: LazyFrame | DataFrame,
variable: str,
segment: list[str] | None = None,
) -> pl.DataFrame
Calculate the median summary for the given dataset and parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the metric. |
required |
dataset
|
LazyFrame | DataFrame
|
Dataset to compute the median on. |
required |
variable
|
str
|
Column name for which to compute the median. |
required |
segment
|
list[str] | None
|
Segmentation groups for calculation. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the median summary and associated metadata. |
options: show_source: false heading_level: 3 group_by_category: false members_order: source filters: - "!^*"
Workflows Interface¶
For batch processing and YAML-driven configurations, use the workflows module:
- Workflows Interface -
load_configuration_from_yaml()function for YAML-based metric execution
Data Formats¶
The library supports two main data formats to accommodate different analysis scenarios:
Record-Level Data¶
Each row represents an individual observation (customer, loan, transaction):
- Best for: Raw model outputs, individual predictions, detailed analysis
- Performance: Optimal for large datasets with Polars lazy evaluation
- Segmentation: Full flexibility for grouping and filtering
Example columns:
probability: Individual probability of default (0.0-1.0)default_flag: Binary outcome (0/1 or boolean)predicted_ead: Individual predicted exposure at defaultactual_ead: Individual actual exposure at default
Summary-Level Data¶
Each row represents pre-aggregated statistics for a segment:
- Best for: Portfolio summaries, pre-computed statistics, reporting
- Performance: Fast calculations on aggregated data
- Segmentation: Limited to existing segment definitions
Example columns:
mean_pd: Mean probability of default for the segment (0.0-1.0)defaults: Count of defaults in the segment (positive numbers or None for most metrics)volume: Total number of observations in the segment (positive numbers or None for most metrics)
Segmentation¶
All metrics support flexible segmentation through the segment parameter:
Basic Segmentation¶
# Group by single column
result = default_accuracy(
name="accuracy_by_region",
dataset=df,
data_format="record_level",
prob_def="probability",
default="default_flag",
segment=["region"]
)
Multi-Level Segmentation¶
# Group by multiple columns
result = mean(
name="exposure_by_region_product",
dataset=df,
variable="exposure_amount",
segment=["region", "product_type"]
)
Performance¶
Optimization Tips¶
- Use Summary-Level Data: Generally faster due to Polars optimization
- Lazy Evaluation: Datasets are processed efficiently with lazy evaluation
- Batch Operations: Workflows execute multiple metrics in parallel
- Memory Management: Large datasets are streamed rather than loaded entirely
Best Practices¶
# Efficient: Let Polars handle the optimization
result = default_accuracy(
name="accuracy",
dataset=large_df.lazy(), # Use lazy frames for large data
data_format="record_level",
prob_def="probability",
default="default_flag"
)
# Less efficient: Pre-filtering reduces optimization opportunities
filtered_df = large_df.filter(pl.col("region") == "North")
result = default_accuracy(
name="accuracy",
dataset=filtered_df,
data_format="record_level",
prob_def="probability",
default="default_flag"
)
Memory Considerations¶
- Large Datasets: Use
pl.scan_csv()or similar scan functions - Multiple Metrics: Use YAML workflows for batch processing
- Segmentation: Prefer single-pass segmentation over multiple separate calls