Skip to content

Metrics Helper Functions

The metrics module provides convenient helper functions for calculating statistical metrics. These functions provide a simple, direct interface for interactive use.

All metric implementations are internal - users should only use these helper functions.

Available Metrics

Accuracy Metrics

  • default_accuracy() - Calculate default accuracy for binary classification models
  • ead_accuracy() - Calculate Exposure at Default (EAD) accuracy
  • hosmer_lemeshow() - Perform Hosmer-Lemeshow goodness-of-fit test
  • jeffreys_test() - Perform Jeffreys Bayesian calibration test
  • rmse() - Calculate Root Mean Squared Error for predicted vs observed values

Discrimination Metrics

  • auc() - Calculate Area Under the ROC Curve
  • kolmogorov_smirnov() - Calculate Kolmogorov-Smirnov statistic for discrimination testing

Summary Statistics

  • mean() - Calculate mean values with optional segmentation
  • median() - Calculate median values with optional segmentation

Function Reference

metrics

Metrics package - Public helper functions for statistical calculations.

This package provides the main public interface for calculating statistical metrics. All metric classes are internal implementations and should not be used directly.

Example usage
from tnp_statistic_library.metrics import default_accuracy, mean, median

result = default_accuracy(
    name="test", dataset=df, data_format="record_level",
    prob_def="prob", default="default"
)

binomial_test

binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    default: str,
    expected_probability: float,
    segment: list[str] | None = None,
) -> pl.DataFrame
binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    defaults: str,
    expected_probability: float,
    segment: list[str] | None = None,
) -> pl.DataFrame
binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate binomial test for record-level or summary-level data.

The binomial test is used to test whether an observed proportion of defaults significantly differs from an expected probability under the null hypothesis.

Parameters:

Name Type Description Default
name str

Name identifier for the metric calculation.

required
dataset LazyFrame | DataFrame

The input data as a Polars LazyFrame or DataFrame.

required
data_format Literal['record_level', 'summary_level']

Format of the input data, either "record_level" or "summary_level".

required
**kwargs Any

Additional keyword arguments specific to the data format.

{}
Record-level format kwargs

default: Column name containing binary default indicators (0/1 or boolean). expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.

Summary-level format kwargs

volume: Column name containing the total number of observations. defaults: Column name containing the number of defaults. expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.

Returns:

Type Description
DataFrame

pl.DataFrame: A DataFrame containing binomial test results with columns: - group_key: Grouping information (struct of segment columns) - volume: Total number of observations - defaults: Number of observed defaults - observed_probability: Observed default rate - expected_probability: Expected default rate under null hypothesis - p_value: Two-tailed p-value from binomial test

Examples:

Record-level data:

binomial_test(
    name="default_rate_test",
    dataset=data,
    data_format="record_level",
    default="default_flag",
    expected_probability=0.05
)

Summary-level data:

binomial_test(
    name="default_rate_test",
    dataset=data,
    data_format="summary_level",
    volume="total_accounts",
    defaults="default_count",
    expected_probability=0.05
)

default_accuracy

default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate default accuracy for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the default accuracy on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing default accuracy metrics for each group.

Examples:

Record-level usage:

result = default_accuracy(
    name="model_accuracy",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = default_accuracy(
    name="portfolio_accuracy",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

ead_accuracy

ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    predicted_ead: str,
    actual_ead: str,
    *,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    predicted_ead: str,
    actual_ead: str,
    *,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    predicted_ead: str,
    actual_ead: str,
    **kwargs: Any,
) -> pl.DataFrame

Calculate EAD accuracy for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: default

Summary-level usage (data_format="summary_level"): Required parameters: defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the EAD accuracy on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
predicted_ead str

Column containing predicted EAD values.

required
actual_ead str

Column containing actual EAD values.

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: default (str), segment (optional) For summary_level: defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing EAD accuracy metrics for each group.

Examples:

Record-level usage:

result = ead_accuracy(
    name="ead_model_accuracy",
    dataset=df,
    data_format="record_level",
    predicted_ead="predicted_ead",
    actual_ead="actual_ead",
    default="default_flag"
)

Summary-level usage:

result = ead_accuracy(
    name="portfolio_ead_accuracy",
    dataset=summary_df,
    data_format="summary_level",
    predicted_ead="predicted_ead",
    actual_ead="actual_ead",
    defaults="defaults",
    volume="volume"
)

hosmer_lemeshow

hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    bands: int = 10,
    segment: list[str] | None = None,
) -> pl.DataFrame
hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    bands: int = 10,
    segment: list[str] | None = None,
) -> pl.DataFrame
hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Hosmer-Lemeshow metric for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Hosmer-Lemeshow test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), bands (int, default=10), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), bands (int, default=10), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Hosmer-Lemeshow test result and associated metadata.

Examples:

Record-level usage:

result = hosmer_lemeshow(
    name="hl_test",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    bands=10
)

Summary-level usage:

result = hosmer_lemeshow(
    name="portfolio_hl_test",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume",
    bands=10
)

jeffreys_test

jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Jeffreys test metric for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Jeffreys test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Jeffreys test result and associated metadata.

Examples:

Record-level usage:

result = jeffreys_test(
    name="jeffreys_test",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = jeffreys_test(
    name="portfolio_jeffreys_test",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

mape

mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_absolute_percentage_errors: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate Mean Absolute Percentage Error (MAPE) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_absolute_percentage_errors

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the MAPE on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_absolute_percentage_errors (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing MAPE metrics for each group.

Examples:

Record-level usage:

result = mape(
    name="model_mape",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = mape(
    name="portfolio_mape",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_absolute_percentage_errors="sum_absolute_percentage_errors"
)

rmse

rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_squared_errors: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate Root Mean Squared Error (RMSE) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_squared_errors

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the RMSE on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_squared_errors (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing RMSE metrics for each group.

Examples:

Record-level usage:

result = rmse(
    name="model_rmse",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = rmse(
    name="portfolio_rmse",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_squared_errors="sum_squared_errors"
)

ttest

ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    null_hypothesis_mean: float = 0.0,
    segment: list[str] | None = None,
) -> pl.DataFrame
ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_differences: str,
    sum_squared_differences: str,
    null_hypothesis_mean: float = 0.0,
    segment: list[str] | None = None,
) -> pl.DataFrame
ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate T-test statistics for record-level or summary-level data.

Performs a one-sample t-test to determine if the mean difference between observed and predicted values is significantly different from a null hypothesis mean.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted Optional parameters: null_hypothesis_mean (default: 0.0)

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_differences, sum_squared_differences Optional parameters: null_hypothesis_mean (default: 0.0)

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the T-test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), null_hypothesis_mean (float), segment (optional) For summary_level: volume (str), sum_differences (str), sum_squared_differences (str), null_hypothesis_mean (float), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing T-test statistics for each group.

Examples:

Record-level usage:

result = ttest(
    name="model_ttest",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = ttest(
    name="portfolio_ttest",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_differences="sum_differences",
    sum_squared_differences="sum_squared_differences"
)

auc

auc(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
auc(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
auc(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Area Under the ROC Curve (auc) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the auc on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the auc result and associated metadata.

Examples:

Record-level usage:

result = auc(
    name="model_auc",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = auc(
    name="portfolio_auc",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

f1_score

f1_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f1_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f1_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the F1 score for record-level or summary-level data.

The F1 score is the harmonic mean of precision and recall, providing a balanced measure of classification performance.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default Optional parameters: threshold (default 0.5)

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume Optional parameters: threshold (default 0.5)

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the F1 score on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), threshold (float), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), threshold (float), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the F1 score and associated metrics for each group.

Examples:

Record-level usage:

result = f1_score(
    name="model_f1",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    threshold=0.6
)

Summary-level usage:

result = f1_score(
    name="portfolio_f1",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume",
    threshold=0.4
)

f2_score

f2_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f2_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f2_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the F2 score for record-level or summary-level data.

The F2 score weights recall higher than precision, making it suitable for scenarios where missing positive cases (false negatives) is more costly than false positives.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default Optional parameters: threshold (default 0.5)

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume Optional parameters: threshold (default 0.5)

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the F2 score on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), threshold (float), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), threshold (float), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the F2 score and associated metrics for each group.

Examples:

Record-level usage:

result = f2_score(
    name="model_f2",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    threshold=0.3
)

Summary-level usage:

result = f2_score(
    name="portfolio_f2",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume",
    threshold=0.7
)

gini

gini(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
gini(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
gini(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Gini coefficient for record-level or summary-level data.

The Gini coefficient is calculated as 2*AUC - 1, where AUC is the Area Under the ROC Curve. It ranges from -1 to 1, where: - 1 indicates perfect discrimination - 0 indicates no discrimination (random) - -1 indicates perfectly inverse discrimination

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Gini coefficient on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Gini coefficient result and associated metadata.

Examples:

Record-level usage:

result = gini(
    name="model_gini",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = gini(
    name="portfolio_gini",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

kolmogorov_smirnov

kolmogorov_smirnov(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
kolmogorov_smirnov(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
kolmogorov_smirnov(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Kolmogorov-Smirnov statistic for record-level or summary-level data.

The Kolmogorov-Smirnov statistic measures the maximum difference between the cumulative distribution functions of predicted scores for defaulters vs non-defaulters. It ranges from 0 to 1, where higher values indicate better discrimination.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the KS statistic on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the KS statistic, p-value, and associated metadata.

Examples:

Record-level usage:

result = kolmogorov_smirnov(
    name="model_ks",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = kolmogorov_smirnov(
    name="portfolio_ks",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

shapiro_wilk

shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    data_column: str,
    segment: SegmentCol = None,
) -> ShapiroWilk
shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    volume: str,
    statistic: str,
    p_value: str,
    segment: SegmentCol = None,
) -> ShapiroWilk
shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    segment: SegmentCol = None,
    **kwargs,
) -> ShapiroWilk

Compute the Shapiro-Wilk test for normality.

The Shapiro-Wilk test is a statistical test to assess whether a dataset follows a normal distribution. It is considered one of the most powerful normality tests, especially for small to medium sample sizes.

The test returns: - statistic: The test statistic (W), ranges from 0 to 1 - p_value: The p-value for the test - volume: The number of observations used in the test

Interpretation guidelines: - The null hypothesis (H0) assumes the data follows a normal distribution - The alternative hypothesis (H1) assumes the data does not follow a normal distribution - Compare p_value to your chosen significance level (alpha): * If p_value < alpha: Evidence against normality (reject H0) * If p_value >= alpha: Insufficient evidence against normality (fail to reject H0) - Common alpha values: 0.05 (5%), 0.01 (1%), or 0.10 (10%)

Limitations: - Requires at least 3 observations - Maximum sample size is 5000 (scipy limitation) - Sensitive to outliers and ties in the data

Parameters:

Name Type Description Default
name str

The name identifier for this metric instance.

required
dataset LazyFrame | DataFrame

The input dataset as either a LazyFrame or DataFrame.

required
data_format Literal['record_level', 'summary_level']

The format of the input data.

required
segment SegmentCol

Optional list of column names to use for segmentation/grouping.

None
**kwargs

Additional arguments based on data_format.

{}
Record-level format args

data_column: The column containing the data to test for normality.

Summary-level format args

volume: The column containing the count of observations. statistic: The column containing pre-computed Shapiro-Wilk statistics. p_value: The column containing pre-computed p-values.

Returns:

Type Description
ShapiroWilk

A ShapiroWilk metric instance ready for computation.

Examples:

Record-level usage:

>>> import polars as pl
>>> from tnp_statistic_library.metrics.normality import shapiro_wilk
>>>
>>> # Create sample data
>>> df = pl.DataFrame({
...     "values": [1.2, 1.1, 1.3, 1.0, 1.4, 1.2, 1.1, 1.5, 1.3, 1.2],
...     "group": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"]
... })
>>>
>>> # Test normality for each group
>>> metric = shapiro_wilk(
...     name="data_normality",
...     dataset=df,
...     data_format="record_level",
...     data_column="values",
...     segment=["group"]
... )
>>> result = metric.run_metric().collect()

Summary-level usage:

>>> df_summary = pl.DataFrame({
...     "volume": [50, 45],
...     "statistic": [0.95, 0.92],
...     "p_value": [0.06, 0.03],
...     "region": ["North", "South"]
... })
>>>
>>> metric = shapiro_wilk(
...     name="regional_normality",
...     dataset=df_summary,
...     data_format="summary_level",
...     volume="volume",
...     statistic="statistic",
...     p_value="p_value",
...     segment=["region"]
... )
>>> result = metric.run_metric().collect()

population_stability_index

population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    band_column: str,
    baseline_column: str,
    current_column: str,
    segment: list[str] | None = None,
    laplace_smoothing: bool = False,
) -> pl.DataFrame
population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    band_column: str,
    baseline_volume: str,
    current_volume: str,
    segment: list[str] | None = None,
    laplace_smoothing: bool = False,
) -> pl.DataFrame
population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Population Stability Index (psi) for record-level or summary-level data.

The Population Stability Index measures distributional stability of predicted probabilities or model scores over time by comparing the distribution across bands between baseline and current periods.

psi Formula: Σ (Current% - Baseline%) * ln(Current% / Baseline%)

psi Interpretation
  • psi < 0.1: Stable (no significant change)
  • 0.1 ≤ psi < 0.2: Moderate shift (monitor closely)
  • psi ≥ 0.2: Significant shift (investigate/retrain)

Record-level usage (data_format="record_level"): Required parameters: band_column, baseline_column, current_column

Summary-level usage (data_format="summary_level"): Required parameters: band_column, baseline_volume, current_volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the psi on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
laplace_smoothing

Whether to apply Laplace smoothing to avoid zero division errors.

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: band_column (str), baseline_column (str), current_column (str), segment (optional) For summary_level: band_column (str), baseline_volume (str), current_volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the psi result and associated metadata including:

DataFrame
  • baseline_volume: Total volume in baseline period
DataFrame
  • current_volume: Total volume in current period
DataFrame
  • volume: Total volume across both periods
DataFrame
  • bucket_count: Number of unique bands/buckets
DataFrame
  • psi: Population Stability Index value

Examples:

Record-level usage with period indicators:

# Combined dataset with period indicators
data = pl.DataFrame({
    "band": ["A", "A", "B", "B", "C", "C", "A", "A", "A", "B", "B", "C"],
    "is_baseline": [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],  # 1 = baseline
    "is_current": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],   # 1 = current
})

result = population_stability_index(
    name="model_stability_check",
    dataset=data,
    data_format="record_level",
    band_column="band",
    baseline_column="is_baseline",
    current_column="is_current"
)

Summary-level usage with explicit volumes:

# Pre-aggregated data with volumes by band
summary_data = pl.DataFrame({
    "band": ["A", "B", "C"],
    "baseline_volume": [1000, 1500, 500],    # Development period volumes
    "current_volume": [1200, 1200, 600]      # Current period volumes
})

result = population_stability_index(
    name="summary_stability_check",
    dataset=summary_data,
    data_format="summary_level",
    band_column="band",
    baseline_volume="baseline_volume",
    current_volume="current_volume"
)

With segmentation and laplace smoothing:

result = population_stability_index(
    name="segmented_stability",
    dataset=data,
    data_format="record_level",
    laplace_smoothing=True,
    band_column="band",
    baseline_column="is_baseline",
    current_column="is_current",
    segment=["product", "region"]
)

mean

mean(
    name: str,
    dataset: LazyFrame | DataFrame,
    variable: str,
    segment: list[str] | None = None,
) -> pl.DataFrame

Calculate the mean summary for the given dataset and parameters.

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the mean on.

required
variable str

Column name for which to compute the mean.

required
segment list[str] | None

Segmentation groups for calculation.

None

Returns:

Type Description
DataFrame

DataFrame containing the mean summary and associated metadata.

median

median(
    name: str,
    dataset: LazyFrame | DataFrame,
    variable: str,
    segment: list[str] | None = None,
) -> pl.DataFrame

Calculate the median summary for the given dataset and parameters.

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the median on.

required
variable str

Column name for which to compute the median.

required
segment list[str] | None

Segmentation groups for calculation.

None

Returns:

Type Description
DataFrame

DataFrame containing the median summary and associated metadata.

options: show_source: false heading_level: 3 group_by_category: false members_order: source filters: - "!^*"