Skip to content

Public API Reference

This section documents the public interface for the TNP Statistic Library.

Helper Functions Interface

The primary way to use the library is through the convenient helper functions that provide a simple, type-safe interface for calculating statistical metrics.

All functions support both record-level and summary-level data formats with automatic validation and optimization.

Organized by Category

Complete Function Reference

metrics

Metrics package - Public helper functions for statistical calculations.

This package provides the main public interface for calculating statistical metrics. All metric classes are internal implementations and should not be used directly.

Example usage
from tnp_statistic_library.metrics import default_accuracy, mean, median

result = default_accuracy(
    name="test", dataset=df, data_format="record_level",
    prob_def="prob", default="default"
)

binomial_test

binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    default: str,
    expected_probability: float,
    segment: list[str] | None = None,
) -> pl.DataFrame
binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    defaults: str,
    expected_probability: float,
    segment: list[str] | None = None,
) -> pl.DataFrame
binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate binomial test for record-level or summary-level data.

The binomial test is used to test whether an observed proportion of defaults significantly differs from an expected probability under the null hypothesis.

Parameters:

Name Type Description Default
name str

Name identifier for the metric calculation.

required
dataset LazyFrame | DataFrame

The input data as a Polars LazyFrame or DataFrame.

required
data_format Literal['record_level', 'summary_level']

Format of the input data, either "record_level" or "summary_level".

required
**kwargs Any

Additional keyword arguments specific to the data format.

{}
Record-level format kwargs

default: Column name containing binary default indicators (0/1 or boolean). expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.

Summary-level format kwargs

volume: Column name containing the total number of observations. defaults: Column name containing the number of defaults. expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.

Returns:

Type Description
DataFrame

pl.DataFrame: A DataFrame containing binomial test results with columns: - group_key: Grouping information (struct of segment columns) - volume: Total number of observations - defaults: Number of observed defaults - observed_probability: Observed default rate - expected_probability: Expected default rate under null hypothesis - p_value: Two-tailed p-value from binomial test

Examples:

Record-level data:

binomial_test(
    name="default_rate_test",
    dataset=data,
    data_format="record_level",
    default="default_flag",
    expected_probability=0.05
)

Summary-level data:

binomial_test(
    name="default_rate_test",
    dataset=data,
    data_format="summary_level",
    volume="total_accounts",
    defaults="default_count",
    expected_probability=0.05
)

default_accuracy

default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate default accuracy for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the default accuracy on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing default accuracy metrics for each group.

Examples:

Record-level usage:

result = default_accuracy(
    name="model_accuracy",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = default_accuracy(
    name="portfolio_accuracy",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

ead_accuracy

ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    predicted_ead: str,
    actual_ead: str,
    *,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    predicted_ead: str,
    actual_ead: str,
    *,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    predicted_ead: str,
    actual_ead: str,
    **kwargs: Any,
) -> pl.DataFrame

Calculate EAD accuracy for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: default

Summary-level usage (data_format="summary_level"): Required parameters: defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the EAD accuracy on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
predicted_ead str

Column containing predicted EAD values.

required
actual_ead str

Column containing actual EAD values.

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: default (str), segment (optional) For summary_level: defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing EAD accuracy metrics for each group.

Examples:

Record-level usage:

result = ead_accuracy(
    name="ead_model_accuracy",
    dataset=df,
    data_format="record_level",
    predicted_ead="predicted_ead",
    actual_ead="actual_ead",
    default="default_flag"
)

Summary-level usage:

result = ead_accuracy(
    name="portfolio_ead_accuracy",
    dataset=summary_df,
    data_format="summary_level",
    predicted_ead="predicted_ead",
    actual_ead="actual_ead",
    defaults="defaults",
    volume="volume"
)

hosmer_lemeshow

hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    bands: int = 10,
    segment: list[str] | None = None,
) -> pl.DataFrame
hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    bands: int = 10,
    segment: list[str] | None = None,
) -> pl.DataFrame
hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Hosmer-Lemeshow metric for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Hosmer-Lemeshow test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), bands (int, default=10), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), bands (int, default=10), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Hosmer-Lemeshow test result and associated metadata.

Examples:

Record-level usage:

result = hosmer_lemeshow(
    name="hl_test",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    bands=10
)

Summary-level usage:

result = hosmer_lemeshow(
    name="portfolio_hl_test",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume",
    bands=10
)

jeffreys_test

jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Jeffreys test metric for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Jeffreys test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Jeffreys test result and associated metadata.

Examples:

Record-level usage:

result = jeffreys_test(
    name="jeffreys_test",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = jeffreys_test(
    name="portfolio_jeffreys_test",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

mape

mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_absolute_percentage_errors: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate Mean Absolute Percentage Error (MAPE) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_absolute_percentage_errors

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the MAPE on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_absolute_percentage_errors (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing MAPE metrics for each group.

Examples:

Record-level usage:

result = mape(
    name="model_mape",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = mape(
    name="portfolio_mape",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_absolute_percentage_errors="sum_absolute_percentage_errors"
)

rmse

rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_squared_errors: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate Root Mean Squared Error (RMSE) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_squared_errors

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the RMSE on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_squared_errors (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing RMSE metrics for each group.

Examples:

Record-level usage:

result = rmse(
    name="model_rmse",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = rmse(
    name="portfolio_rmse",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_squared_errors="sum_squared_errors"
)

ttest

ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    null_hypothesis_mean: float = 0.0,
    segment: list[str] | None = None,
) -> pl.DataFrame
ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_differences: str,
    sum_squared_differences: str,
    null_hypothesis_mean: float = 0.0,
    segment: list[str] | None = None,
) -> pl.DataFrame
ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate T-test statistics for record-level or summary-level data.

Performs a one-sample t-test to determine if the mean difference between observed and predicted values is significantly different from a null hypothesis mean.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted Optional parameters: null_hypothesis_mean (default: 0.0)

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_differences, sum_squared_differences Optional parameters: null_hypothesis_mean (default: 0.0)

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the T-test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), null_hypothesis_mean (float), segment (optional) For summary_level: volume (str), sum_differences (str), sum_squared_differences (str), null_hypothesis_mean (float), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing T-test statistics for each group.

Examples:

Record-level usage:

result = ttest(
    name="model_ttest",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = ttest(
    name="portfolio_ttest",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_differences="sum_differences",
    sum_squared_differences="sum_squared_differences"
)

auc

auc(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
auc(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
auc(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Area Under the ROC Curve (auc) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the auc on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the auc result and associated metadata.

Examples:

Record-level usage:

result = auc(
    name="model_auc",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = auc(
    name="portfolio_auc",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

f1_score

f1_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f1_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f1_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the F1 score for record-level or summary-level data.

The F1 score is the harmonic mean of precision and recall, providing a balanced measure of classification performance.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default Optional parameters: threshold (default 0.5)

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume Optional parameters: threshold (default 0.5)

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the F1 score on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), threshold (float), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), threshold (float), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the F1 score and associated metrics for each group.

Examples:

Record-level usage:

result = f1_score(
    name="model_f1",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    threshold=0.6
)

Summary-level usage:

result = f1_score(
    name="portfolio_f1",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume",
    threshold=0.4
)

f2_score

f2_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f2_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    threshold: float = 0.5,
    segment: list[str] | None = None,
) -> pl.DataFrame
f2_score(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the F2 score for record-level or summary-level data.

The F2 score weights recall higher than precision, making it suitable for scenarios where missing positive cases (false negatives) is more costly than false positives.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default Optional parameters: threshold (default 0.5)

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume Optional parameters: threshold (default 0.5)

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the F2 score on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), threshold (float), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), threshold (float), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the F2 score and associated metrics for each group.

Examples:

Record-level usage:

result = f2_score(
    name="model_f2",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    threshold=0.3
)

Summary-level usage:

result = f2_score(
    name="portfolio_f2",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume",
    threshold=0.7
)

gini

gini(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
gini(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
gini(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Gini coefficient for record-level or summary-level data.

The Gini coefficient is calculated as 2*AUC - 1, where AUC is the Area Under the ROC Curve. It ranges from -1 to 1, where: - 1 indicates perfect discrimination - 0 indicates no discrimination (random) - -1 indicates perfectly inverse discrimination

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Gini coefficient on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Gini coefficient result and associated metadata.

Examples:

Record-level usage:

result = gini(
    name="model_gini",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = gini(
    name="portfolio_gini",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

kolmogorov_smirnov

kolmogorov_smirnov(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
kolmogorov_smirnov(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
kolmogorov_smirnov(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Kolmogorov-Smirnov statistic for record-level or summary-level data.

The Kolmogorov-Smirnov statistic measures the maximum difference between the cumulative distribution functions of predicted scores for defaulters vs non-defaulters. It ranges from 0 to 1, where higher values indicate better discrimination.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the KS statistic on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the KS statistic, p-value, and associated metadata.

Examples:

Record-level usage:

result = kolmogorov_smirnov(
    name="model_ks",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = kolmogorov_smirnov(
    name="portfolio_ks",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

shapiro_wilk

shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    data_column: str,
    segment: SegmentCol = None,
) -> ShapiroWilk
shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    volume: str,
    statistic: str,
    p_value: str,
    segment: SegmentCol = None,
) -> ShapiroWilk
shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    segment: SegmentCol = None,
    **kwargs,
) -> ShapiroWilk

Compute the Shapiro-Wilk test for normality.

The Shapiro-Wilk test is a statistical test to assess whether a dataset follows a normal distribution. It is considered one of the most powerful normality tests, especially for small to medium sample sizes.

The test returns: - statistic: The test statistic (W), ranges from 0 to 1 - p_value: The p-value for the test - volume: The number of observations used in the test

Interpretation guidelines: - The null hypothesis (H0) assumes the data follows a normal distribution - The alternative hypothesis (H1) assumes the data does not follow a normal distribution - Compare p_value to your chosen significance level (alpha): * If p_value < alpha: Evidence against normality (reject H0) * If p_value >= alpha: Insufficient evidence against normality (fail to reject H0) - Common alpha values: 0.05 (5%), 0.01 (1%), or 0.10 (10%)

Limitations: - Requires at least 3 observations - Maximum sample size is 5000 (scipy limitation) - Sensitive to outliers and ties in the data

Parameters:

Name Type Description Default
name str

The name identifier for this metric instance.

required
dataset LazyFrame | DataFrame

The input dataset as either a LazyFrame or DataFrame.

required
data_format Literal['record_level', 'summary_level']

The format of the input data.

required
segment SegmentCol

Optional list of column names to use for segmentation/grouping.

None
**kwargs

Additional arguments based on data_format.

{}
Record-level format args

data_column: The column containing the data to test for normality.

Summary-level format args

volume: The column containing the count of observations. statistic: The column containing pre-computed Shapiro-Wilk statistics. p_value: The column containing pre-computed p-values.

Returns:

Type Description
ShapiroWilk

A ShapiroWilk metric instance ready for computation.

Examples:

Record-level usage:

>>> import polars as pl
>>> from tnp_statistic_library.metrics.normality import shapiro_wilk
>>>
>>> # Create sample data
>>> df = pl.DataFrame({
...     "values": [1.2, 1.1, 1.3, 1.0, 1.4, 1.2, 1.1, 1.5, 1.3, 1.2],
...     "group": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"]
... })
>>>
>>> # Test normality for each group
>>> metric = shapiro_wilk(
...     name="data_normality",
...     dataset=df,
...     data_format="record_level",
...     data_column="values",
...     segment=["group"]
... )
>>> result = metric.run_metric().collect()

Summary-level usage:

>>> df_summary = pl.DataFrame({
...     "volume": [50, 45],
...     "statistic": [0.95, 0.92],
...     "p_value": [0.06, 0.03],
...     "region": ["North", "South"]
... })
>>>
>>> metric = shapiro_wilk(
...     name="regional_normality",
...     dataset=df_summary,
...     data_format="summary_level",
...     volume="volume",
...     statistic="statistic",
...     p_value="p_value",
...     segment=["region"]
... )
>>> result = metric.run_metric().collect()

population_stability_index

population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    band_column: str,
    baseline_column: str,
    current_column: str,
    segment: list[str] | None = None,
    laplace_smoothing: bool = False,
) -> pl.DataFrame
population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    band_column: str,
    baseline_volume: str,
    current_volume: str,
    segment: list[str] | None = None,
    laplace_smoothing: bool = False,
) -> pl.DataFrame
population_stability_index(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Population Stability Index (psi) for record-level or summary-level data.

The Population Stability Index measures distributional stability of predicted probabilities or model scores over time by comparing the distribution across bands between baseline and current periods.

psi Formula: Σ (Current% - Baseline%) * ln(Current% / Baseline%)

psi Interpretation
  • psi < 0.1: Stable (no significant change)
  • 0.1 ≤ psi < 0.2: Moderate shift (monitor closely)
  • psi ≥ 0.2: Significant shift (investigate/retrain)

Record-level usage (data_format="record_level"): Required parameters: band_column, baseline_column, current_column

Summary-level usage (data_format="summary_level"): Required parameters: band_column, baseline_volume, current_volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the psi on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
laplace_smoothing

Whether to apply Laplace smoothing to avoid zero division errors.

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: band_column (str), baseline_column (str), current_column (str), segment (optional) For summary_level: band_column (str), baseline_volume (str), current_volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the psi result and associated metadata including:

DataFrame
  • baseline_volume: Total volume in baseline period
DataFrame
  • current_volume: Total volume in current period
DataFrame
  • volume: Total volume across both periods
DataFrame
  • bucket_count: Number of unique bands/buckets
DataFrame
  • psi: Population Stability Index value

Examples:

Record-level usage with period indicators:

# Combined dataset with period indicators
data = pl.DataFrame({
    "band": ["A", "A", "B", "B", "C", "C", "A", "A", "A", "B", "B", "C"],
    "is_baseline": [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],  # 1 = baseline
    "is_current": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],   # 1 = current
})

result = population_stability_index(
    name="model_stability_check",
    dataset=data,
    data_format="record_level",
    band_column="band",
    baseline_column="is_baseline",
    current_column="is_current"
)

Summary-level usage with explicit volumes:

# Pre-aggregated data with volumes by band
summary_data = pl.DataFrame({
    "band": ["A", "B", "C"],
    "baseline_volume": [1000, 1500, 500],    # Development period volumes
    "current_volume": [1200, 1200, 600]      # Current period volumes
})

result = population_stability_index(
    name="summary_stability_check",
    dataset=summary_data,
    data_format="summary_level",
    band_column="band",
    baseline_volume="baseline_volume",
    current_volume="current_volume"
)

With segmentation and laplace smoothing:

result = population_stability_index(
    name="segmented_stability",
    dataset=data,
    data_format="record_level",
    laplace_smoothing=True,
    band_column="band",
    baseline_column="is_baseline",
    current_column="is_current",
    segment=["product", "region"]
)

mean

mean(
    name: str,
    dataset: LazyFrame | DataFrame,
    variable: str,
    segment: list[str] | None = None,
) -> pl.DataFrame

Calculate the mean summary for the given dataset and parameters.

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the mean on.

required
variable str

Column name for which to compute the mean.

required
segment list[str] | None

Segmentation groups for calculation.

None

Returns:

Type Description
DataFrame

DataFrame containing the mean summary and associated metadata.

median

median(
    name: str,
    dataset: LazyFrame | DataFrame,
    variable: str,
    segment: list[str] | None = None,
) -> pl.DataFrame

Calculate the median summary for the given dataset and parameters.

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the median on.

required
variable str

Column name for which to compute the median.

required
segment list[str] | None

Segmentation groups for calculation.

None

Returns:

Type Description
DataFrame

DataFrame containing the median summary and associated metadata.

options: show_source: false heading_level: 3 group_by_category: false members_order: source filters: - "!^*"

Workflows Interface

For batch processing and YAML-driven configurations, use the workflows module:

  • Workflows Interface - load_configuration_from_yaml() function for YAML-based metric execution

Data Formats

The library supports two main data formats to accommodate different analysis scenarios:

Record-Level Data

Each row represents an individual observation (customer, loan, transaction):

  • Best for: Raw model outputs, individual predictions, detailed analysis
  • Performance: Optimal for large datasets with Polars lazy evaluation
  • Segmentation: Full flexibility for grouping and filtering

Example columns:

  • probability: Individual probability of default (0.0-1.0)
  • default_flag: Binary outcome (0/1 or boolean)
  • predicted_ead: Individual predicted exposure at default
  • actual_ead: Individual actual exposure at default

Summary-Level Data

Each row represents pre-aggregated statistics for a segment:

  • Best for: Portfolio summaries, pre-computed statistics, reporting
  • Performance: Fast calculations on aggregated data
  • Segmentation: Limited to existing segment definitions

Example columns:

  • mean_pd: Mean probability of default for the segment (0.0-1.0)
  • defaults: Count of defaults in the segment (positive numbers or None for most metrics)
  • volume: Total number of observations in the segment (positive numbers or None for most metrics)

Segmentation

All metrics support flexible segmentation through the segment parameter:

Basic Segmentation

# Group by single column
result = default_accuracy(
    name="accuracy_by_region",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    segment=["region"]
)

Multi-Level Segmentation

# Group by multiple columns
result = mean(
    name="exposure_by_region_product",
    dataset=df,
    variable="exposure_amount",
    segment=["region", "product_type"]
)

Performance

Optimization Tips

  1. Use Summary-Level Data: Generally faster due to Polars optimization
  2. Lazy Evaluation: Datasets are processed efficiently with lazy evaluation
  3. Batch Operations: Workflows execute multiple metrics in parallel
  4. Memory Management: Large datasets are streamed rather than loaded entirely

Best Practices

# Efficient: Let Polars handle the optimization
result = default_accuracy(
    name="accuracy",
    dataset=large_df.lazy(),  # Use lazy frames for large data
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

# Less efficient: Pre-filtering reduces optimization opportunities
filtered_df = large_df.filter(pl.col("region") == "North")
result = default_accuracy(
    name="accuracy",
    dataset=filtered_df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Memory Considerations

  • Large Datasets: Use pl.scan_csv() or similar scan functions
  • Multiple Metrics: Use YAML workflows for batch processing
  • Segmentation: Prefer single-pass segmentation over multiple separate calls