Recipes Configuration¶

The recipes module provides YAML-driven configuration for running multiple metrics as batches. This is the recommended approach for:

Batch processing of multiple metrics
Standardized metric configurations
Production environments with consistent setups

Overview¶

The recipes system allows you to define metrics and datasets in YAML files, enabling declarative configuration of statistical computations. Each metric configuration supports fan-out expansion, where lists of values automatically expand into multiple metric instances.

Core Concepts¶

Fan-out Expansion¶

When you provide lists for certain fields (like name and segment), the system automatically expands them into multiple metric configurations. All fan-out fields must have the same length to ensure proper pairing.

Basic Structure¶

datasets:
  dataset_name:
    type: csv
    source: path/to/data.csv
collections:
  metric_id:
    metrics:
    - name:
      - metric1
      - metric2
      segment:
      - - segment1
      - - segment2
      metric_type: metric_name
      data_format: record
    dataset: dataset_name

Segment Configuration¶

Segments define how to group your data for analysis:

Use null for no segmentation on a particular metric
Use ["column_name"] for single-column segmentation
Use ["col1", "col2"] for multi-column segmentation
When using fan-out, each segment entry corresponds to one metric

Example:

collections:
  segmentation_example:
    dataset: dataset_name
    metrics:
      - metric_type: default_accuracy
        data_format: record
        name: ["total_metric", "segmented_metric", "multi_segment_metric"]
        segment: [null, ["product_type"], ["product_type", "region"]]
        prob_def: probability
        default: default_flag

Available Metrics¶

The following metric types are supported:

default_accuracy: Default prediction accuracy validation
ead_accuracy: Exposure at Default accuracy validation
hosmer_lemeshow: Hosmer-Lemeshow goodness-of-fit test
jeffreys_test: Jeffreys Bayesian calibration test
mape: Mean Absolute Percentage Error for scale-independent prediction accuracy
rmse: Root Mean Squared Error for prediction accuracy assessment
auc: Area Under the ROC Curve discrimination metric
gini: Gini coefficient discrimination metric
population_stability_index: Population Stability Index for distribution shift monitoring
mean: Mean summary statistic
median: Median summary statistic

Most metrics support both record and summary data formats; mean/median are record-level only.

Getting Started¶

Configuration Overview - Learn about YAML structure and fan-out expansion
Complete Examples - Full recipe configurations and patterns

Metric Documentation¶

Accuracy Metrics¶

Default Accuracy - Binary classification accuracy
EAD Accuracy - Exposure at Default accuracy with confidence intervals
MAPE - Mean Absolute Percentage Error for scale-independent accuracy
RMSE - Root Mean Squared Error for continuous prediction accuracy

Statistical Tests¶

AUC - Area Under Curve for discrimination
Gini - Gini coefficient for discrimination
Kolmogorov-Smirnov - KS statistic for discrimination testing
Hosmer-Lemeshow - Goodness of fit testing
Jeffreys Test - Distribution comparison testing

Stability Metrics¶

Population Stability Index - Distribution shift monitoring and population drift detection

Summary Statistics¶

Mean - Arithmetic mean calculation with segmentation
Median - Robust central tendency with quartiles

Each metric documentation includes configuration fields, output columns, data requirements, fan-out examples, and usage notes.

recipes ¶

YAML recipe interface for batch metric execution.

This module provides the YAML recipe approach for using the TNP statistic library. Define metric collections in YAML files and execute them as batches.

Example usage

from tnp_statistic_library.recipes import load_configuration_from_yaml

config = load_configuration_from_yaml("my_metrics.yaml")
results = config.collections.run()
df = results.to_dataframe()

load_configuration_from_yaml ¶

load_configuration_from_yaml(
    yaml_file: str | Path,
) -> Configuration

Load configuration from a YAML file.

Parameters:

Name	Type	Description	Default
`yaml_file`	`str \| Path`	Path to YAML file or raw YAML string	required

Returns:

Type	Description
`Configuration`	Configuration object that can be used to collect metrics

Example

config = load_configuration_from_yaml("metrics.yaml")
results = config.collections.run()

options: show_source: false heading_level: 2 members_order: source