Workflows Configuration¶

The workflows module provides YAML-driven configuration for running multiple metrics as batches. This is the recommended approach for:

Batch processing of multiple metrics
Standardized metric configurations
Production environments with consistent setups

Overview¶

The workflows system allows you to define metrics and datasets in YAML files, enabling declarative configuration of statistical computations. Each metric configuration supports fan-out expansion, where lists of values automatically expand into multiple metric instances.

Core Concepts¶

Fan-out Expansion¶

When you provide lists for certain fields (like name and segment), the system automatically expands them into multiple metric configurations. All fan-out fields must have the same length to ensure proper pairing.

Basic Structure¶

metrics:
  metric_id:
    metric_type: "metric_name"
    config:
      name: ["metric1", "metric2"] # Fan-out field
      segment: [["segment1"], ["segment2"]] # Fan-out field - must match length
      # ... other configuration fields
      dataset: "dataset_name"

datasets:
  dataset_name:
    location: "path/to/data.csv"

Segment Configuration¶

Segments define how to group your data for analysis:

Use null for no segmentation on a particular metric
Use ["column_name"] for single-column segmentation
Use ["col1", "col2"] for multi-column segmentation
When using fan-out, each segment entry corresponds to one metric

Example:

config:
  name: ["total_metric", "segmented_metric", "multi_segment_metric"]
  segment: [null, ["product_type"], ["product_type", "region"]]

Available Metrics¶

The following metric types are supported:

default_accuracy: Default prediction accuracy validation
ead_accuracy: Exposure at Default accuracy validation
hosmer_lemeshow: Hosmer-Lemeshow goodness-of-fit test
jeffreys_test: Jeffreys Bayesian calibration test
mape: Mean Absolute Percentage Error for scale-independent prediction accuracy
rmse: Root Mean Squared Error for prediction accuracy assessment
auc: Area Under the ROC Curve discrimination metric
gini: Gini coefficient discrimination metric
population_stability_index: Population Stability Index for distribution shift monitoring
mean: Mean summary statistic
median: Median summary statistic

Each metric supports both record_level and summary_level data formats (except mean/median which work with any data format).

Getting Started¶

Configuration Overview - Learn about YAML structure and fan-out expansion
Complete Examples - Full workflow configurations and patterns

Metric Documentation¶

Accuracy Metrics¶

Default Accuracy - Binary classification accuracy
EAD Accuracy - Exposure at Default accuracy with confidence intervals
MAPE - Mean Absolute Percentage Error for scale-independent accuracy
RMSE - Root Mean Squared Error for continuous prediction accuracy

Statistical Tests¶

AUC - Area Under Curve for discrimination
Gini - Gini coefficient for discrimination
Kolmogorov-Smirnov - KS statistic for discrimination testing
Hosmer-Lemeshow - Goodness of fit testing
Jeffreys Test - Distribution comparison testing

Stability Metrics¶

Population Stability Index - Distribution shift monitoring and population drift detection

Summary Statistics¶

Mean - Arithmetic mean calculation with segmentation
Median - Robust central tendency with quartiles

Each metric documentation includes configuration fields, output columns, data requirements, fan-out examples, and usage notes.

workflows ¶

YAML workflow interface for batch metric execution.

This module provides the YAML workflow approach for using the TNP statistic library. Define metric collections in YAML files and execute them as batches.

Example usage

from tnp_statistic_library.workflows import load_configuration_from_yaml

config = load_configuration_from_yaml("my_metrics.yaml")
results = config.metrics.collect_all()
df = results.to_dataframe()

load_configuration_from_yaml ¶

load_configuration_from_yaml(
    yaml_file: str | Path,
) -> Configuration

Load configuration from a YAML file.

Parameters:

Name	Type	Description	Default
`yaml_file`	`str \| Path`	Path to YAML file or raw YAML string	required

Returns:

Type	Description
`Configuration`	Configuration object that can be used to collect metrics

Example

config = load_configuration_from_yaml("metrics.yaml")
results = config.metrics.collect_all()

options: show_source: false heading_level: 2 members_order: source