Skip to content

Workflows Configuration

The workflows module provides YAML-driven configuration for running multiple metrics as batches. This is the recommended approach for:

  • Batch processing of multiple metrics
  • Standardized metric configurations
  • Production environments with consistent setups

Overview

The workflows system allows you to define metrics and datasets in YAML files, enabling declarative configuration of statistical computations. Each metric configuration supports fan-out expansion, where lists of values automatically expand into multiple metric instances.

Core Concepts

Fan-out Expansion

When you provide lists for certain fields (like name and segment), the system automatically expands them into multiple metric configurations. All fan-out fields must have the same length to ensure proper pairing.

Basic Structure

metrics:
  metric_id:
    metric_type: "metric_name"
    config:
      name: ["metric1", "metric2"] # Fan-out field
      segment: [["segment1"], ["segment2"]] # Fan-out field - must match length
      # ... other configuration fields
      dataset: "dataset_name"

datasets:
  dataset_name:
    location: "path/to/data.csv"

Segment Configuration

Segments define how to group your data for analysis:

  • Use null for no segmentation on a particular metric
  • Use ["column_name"] for single-column segmentation
  • Use ["col1", "col2"] for multi-column segmentation
  • When using fan-out, each segment entry corresponds to one metric

Example:

config:
  name: ["total_metric", "segmented_metric", "multi_segment_metric"]
  segment: [null, ["product_type"], ["product_type", "region"]]

Available Metrics

The following metric types are supported:

  • default_accuracy: Default prediction accuracy validation
  • ead_accuracy: Exposure at Default accuracy validation
  • hosmer_lemeshow: Hosmer-Lemeshow goodness-of-fit test
  • jeffreys_test: Jeffreys Bayesian calibration test
  • mape: Mean Absolute Percentage Error for scale-independent prediction accuracy
  • rmse: Root Mean Squared Error for prediction accuracy assessment
  • auc: Area Under the ROC Curve discrimination metric
  • gini: Gini coefficient discrimination metric
  • population_stability_index: Population Stability Index for distribution shift monitoring
  • mean: Mean summary statistic
  • median: Median summary statistic

Each metric supports both record_level and summary_level data formats (except mean/median which work with any data format).

Getting Started

  1. Configuration Overview - Learn about YAML structure and fan-out expansion
  2. Complete Examples - Full workflow configurations and patterns

Metric Documentation

Accuracy Metrics

  • Default Accuracy - Binary classification accuracy
  • EAD Accuracy - Exposure at Default accuracy with confidence intervals
  • MAPE - Mean Absolute Percentage Error for scale-independent accuracy
  • RMSE - Root Mean Squared Error for continuous prediction accuracy

Statistical Tests

Stability Metrics

Summary Statistics

  • Mean - Arithmetic mean calculation with segmentation
  • Median - Robust central tendency with quartiles

Each metric documentation includes configuration fields, output columns, data requirements, fan-out examples, and usage notes.

workflows

YAML workflow interface for batch metric execution.

This module provides the YAML workflow approach for using the TNP statistic library. Define metric collections in YAML files and execute them as batches.

Example usage
from tnp_statistic_library.workflows import load_configuration_from_yaml

config = load_configuration_from_yaml("my_metrics.yaml")
results = config.metrics.collect_all()
df = results.to_dataframe()

load_configuration_from_yaml

load_configuration_from_yaml(
    yaml_file: str | Path,
) -> Configuration

Load configuration from a YAML file.

Parameters:

Name Type Description Default
yaml_file str | Path

Path to YAML file or raw YAML string

required

Returns:

Type Description
Configuration

Configuration object that can be used to collect metrics

Example
config = load_configuration_from_yaml("metrics.yaml")
results = config.metrics.collect_all()

options: show_source: false heading_level: 2 members_order: source