Skip to content

Accuracy Metrics

Accuracy metrics for model validation and performance assessment.

accuracy

Accuracy metrics helper functions.

This module provides convenient helper functions for accuracy-related statistical metrics.

default_accuracy

default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
default_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate default accuracy for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the default accuracy on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing default accuracy metrics for each group.

Examples:

Record-level usage:

result = default_accuracy(
    name="model_accuracy",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = default_accuracy(
    name="portfolio_accuracy",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

ead_accuracy

ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    predicted_ead: str,
    actual_ead: str,
    *,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    predicted_ead: str,
    actual_ead: str,
    *,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
ead_accuracy(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    predicted_ead: str,
    actual_ead: str,
    **kwargs: Any,
) -> pl.DataFrame

Calculate EAD accuracy for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: default

Summary-level usage (data_format="summary_level"): Required parameters: defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the EAD accuracy on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
predicted_ead str

Column containing predicted EAD values.

required
actual_ead str

Column containing actual EAD values.

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: default (str), segment (optional) For summary_level: defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing EAD accuracy metrics for each group.

Examples:

Record-level usage:

result = ead_accuracy(
    name="ead_model_accuracy",
    dataset=df,
    data_format="record_level",
    predicted_ead="predicted_ead",
    actual_ead="actual_ead",
    default="default_flag"
)

Summary-level usage:

result = ead_accuracy(
    name="portfolio_ead_accuracy",
    dataset=summary_df,
    data_format="summary_level",
    predicted_ead="predicted_ead",
    actual_ead="actual_ead",
    defaults="defaults",
    volume="volume"
)

hosmer_lemeshow

hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    bands: int = 10,
    segment: list[str] | None = None,
) -> pl.DataFrame
hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    bands: int = 10,
    segment: list[str] | None = None,
) -> pl.DataFrame
hosmer_lemeshow(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Hosmer-Lemeshow metric for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Hosmer-Lemeshow test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), bands (int, default=10), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), bands (int, default=10), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Hosmer-Lemeshow test result and associated metadata.

Examples:

Record-level usage:

result = hosmer_lemeshow(
    name="hl_test",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag",
    bands=10
)

Summary-level usage:

result = hosmer_lemeshow(
    name="portfolio_hl_test",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume",
    bands=10
)

jeffreys_test

jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    prob_def: str,
    default: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    mean_pd: str,
    defaults: str,
    volume: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
jeffreys_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate the Jeffreys test metric for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: prob_def, default

Summary-level usage (data_format="summary_level"): Required parameters: mean_pd, defaults, volume

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the Jeffreys test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: prob_def (str), default (str), segment (optional) For summary_level: mean_pd (str), defaults (str), volume (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing the Jeffreys test result and associated metadata.

Examples:

Record-level usage:

result = jeffreys_test(
    name="jeffreys_test",
    dataset=df,
    data_format="record_level",
    prob_def="probability",
    default="default_flag"
)

Summary-level usage:

result = jeffreys_test(
    name="portfolio_jeffreys_test",
    dataset=summary_df,
    data_format="summary_level",
    mean_pd="mean_pd",
    defaults="defaults",
    volume="volume"
)

rmse

rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_squared_errors: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
rmse(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate Root Mean Squared Error (RMSE) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_squared_errors

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the RMSE on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_squared_errors (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing RMSE metrics for each group.

Examples:

Record-level usage:

result = rmse(
    name="model_rmse",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = rmse(
    name="portfolio_rmse",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_squared_errors="sum_squared_errors"
)

mape

mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_absolute_percentage_errors: str,
    segment: list[str] | None = None,
) -> pl.DataFrame
mape(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate Mean Absolute Percentage Error (MAPE) for record-level or summary-level data.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_absolute_percentage_errors

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the MAPE on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), segment (optional) For summary_level: volume (str), sum_absolute_percentage_errors (str), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing MAPE metrics for each group.

Examples:

Record-level usage:

result = mape(
    name="model_mape",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = mape(
    name="portfolio_mape",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_absolute_percentage_errors="sum_absolute_percentage_errors"
)

ttest

ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    observed: str,
    predicted: str,
    null_hypothesis_mean: float = 0.0,
    segment: list[str] | None = None,
) -> pl.DataFrame
ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    sum_differences: str,
    sum_squared_differences: str,
    null_hypothesis_mean: float = 0.0,
    segment: list[str] | None = None,
) -> pl.DataFrame
ttest(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate T-test statistics for record-level or summary-level data.

Performs a one-sample t-test to determine if the mean difference between observed and predicted values is significantly different from a null hypothesis mean.

Record-level usage (data_format="record_level"): Required parameters: observed, predicted Optional parameters: null_hypothesis_mean (default: 0.0)

Summary-level usage (data_format="summary_level"): Required parameters: volume, sum_differences, sum_squared_differences Optional parameters: null_hypothesis_mean (default: 0.0)

Parameters:

Name Type Description Default
name str

Name of the metric.

required
dataset LazyFrame | DataFrame

Dataset to compute the T-test on.

required
data_format Literal['record_level', 'summary_level']

Format of the input data ("record_level" or "summary_level").

required
**kwargs Any

Additional keyword arguments specific to the data format. For record_level: observed (str), predicted (str), null_hypothesis_mean (float), segment (optional) For summary_level: volume (str), sum_differences (str), sum_squared_differences (str), null_hypothesis_mean (float), segment (optional)

{}

Returns:

Type Description
DataFrame

DataFrame containing T-test statistics for each group.

Examples:

Record-level usage:

result = ttest(
    name="model_ttest",
    dataset=df,
    data_format="record_level",
    observed="observed_values",
    predicted="predicted_values"
)

Summary-level usage:

result = ttest(
    name="portfolio_ttest",
    dataset=summary_df,
    data_format="summary_level",
    volume="volume",
    sum_differences="sum_differences",
    sum_squared_differences="sum_squared_differences"
)

binomial_test

binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    *,
    default: str,
    expected_probability: float,
    segment: list[str] | None = None,
) -> pl.DataFrame
binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    *,
    volume: str,
    defaults: str,
    expected_probability: float,
    segment: list[str] | None = None,
) -> pl.DataFrame
binomial_test(
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    **kwargs: Any,
) -> pl.DataFrame

Calculate binomial test for record-level or summary-level data.

The binomial test is used to test whether an observed proportion of defaults significantly differs from an expected probability under the null hypothesis.

Parameters:

Name Type Description Default
name str

Name identifier for the metric calculation.

required
dataset LazyFrame | DataFrame

The input data as a Polars LazyFrame or DataFrame.

required
data_format Literal['record_level', 'summary_level']

Format of the input data, either "record_level" or "summary_level".

required
**kwargs Any

Additional keyword arguments specific to the data format.

{}
Record-level format kwargs

default: Column name containing binary default indicators (0/1 or boolean). expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.

Summary-level format kwargs

volume: Column name containing the total number of observations. defaults: Column name containing the number of defaults. expected_probability: Expected probability of default under null hypothesis (0.0-1.0). segment: Optional list of column names to group by for segmented analysis.

Returns:

Type Description
DataFrame

pl.DataFrame: A DataFrame containing binomial test results with columns: - group_key: Grouping information (struct of segment columns) - volume: Total number of observations - defaults: Number of observed defaults - observed_probability: Observed default rate - expected_probability: Expected default rate under null hypothesis - p_value: Two-tailed p-value from binomial test

Examples:

Record-level data:

binomial_test(
    name="default_rate_test",
    dataset=data,
    data_format="record_level",
    default="default_flag",
    expected_probability=0.05
)

Summary-level data:

binomial_test(
    name="default_rate_test",
    dataset=data,
    data_format="summary_level",
    volume="total_accounts",
    defaults="default_count",
    expected_probability=0.05
)

options: show_source: false heading_level: 2 members_order: source