Workflows Configuration¶
The workflows module provides YAML-driven configuration for running multiple metrics as batches. This is the recommended approach for:
- Batch processing of multiple metrics
- Standardized metric configurations
- Production environments with consistent setups
Overview¶
The workflows system allows you to define metrics and datasets in YAML files, enabling declarative configuration of statistical computations. Each metric configuration supports fan-out expansion, where lists of values automatically expand into multiple metric instances.
Core Concepts¶
Fan-out Expansion¶
When you provide lists for certain fields (like name and segment), the system automatically expands them into multiple metric configurations. All fan-out fields must have the same length to ensure proper pairing.
Basic Structure¶
metrics:
metric_id:
metric_type: "metric_name"
config:
name: ["metric1", "metric2"] # Fan-out field
segment: [["segment1"], ["segment2"]] # Fan-out field - must match length
# ... other configuration fields
dataset: "dataset_name"
datasets:
dataset_name:
location: "path/to/data.csv"
Segment Configuration¶
Segments define how to group your data for analysis:
- Use
nullfor no segmentation on a particular metric - Use
["column_name"]for single-column segmentation - Use
["col1", "col2"]for multi-column segmentation - When using fan-out, each segment entry corresponds to one metric
Example:
config:
name: ["total_metric", "segmented_metric", "multi_segment_metric"]
segment: [null, ["product_type"], ["product_type", "region"]]
Available Metrics¶
The following metric types are supported:
- default_accuracy: Default prediction accuracy validation
- ead_accuracy: Exposure at Default accuracy validation
- hosmer_lemeshow: Hosmer-Lemeshow goodness-of-fit test
- jeffreys_test: Jeffreys Bayesian calibration test
- mape: Mean Absolute Percentage Error for scale-independent prediction accuracy
- rmse: Root Mean Squared Error for prediction accuracy assessment
- auc: Area Under the ROC Curve discrimination metric
- gini: Gini coefficient discrimination metric
- population_stability_index: Population Stability Index for distribution shift monitoring
- mean: Mean summary statistic
- median: Median summary statistic
Each metric supports both record_level and summary_level data formats (except mean/median which work with any data format).
Getting Started¶
- Configuration Overview - Learn about YAML structure and fan-out expansion
- Complete Examples - Full workflow configurations and patterns
Metric Documentation¶
Accuracy Metrics¶
- Default Accuracy - Binary classification accuracy
- EAD Accuracy - Exposure at Default accuracy with confidence intervals
- MAPE - Mean Absolute Percentage Error for scale-independent accuracy
- RMSE - Root Mean Squared Error for continuous prediction accuracy
Statistical Tests¶
- AUC - Area Under Curve for discrimination
- Gini - Gini coefficient for discrimination
- Kolmogorov-Smirnov - KS statistic for discrimination testing
- Hosmer-Lemeshow - Goodness of fit testing
- Jeffreys Test - Distribution comparison testing
Stability Metrics¶
- Population Stability Index - Distribution shift monitoring and population drift detection
Summary Statistics¶
- Mean - Arithmetic mean calculation with segmentation
- Median - Robust central tendency with quartiles
Each metric documentation includes configuration fields, output columns, data requirements, fan-out examples, and usage notes.
workflows ¶
YAML workflow interface for batch metric execution.
This module provides the YAML workflow approach for using the TNP statistic library. Define metric collections in YAML files and execute them as batches.
Example usage
load_configuration_from_yaml ¶
Load configuration from a YAML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
yaml_file
|
str | Path
|
Path to YAML file or raw YAML string |
required |
Returns:
| Type | Description |
|---|---|
Configuration
|
Configuration object that can be used to collect metrics |
options: show_source: false heading_level: 2 members_order: source