Jeffreys Test Metric¶
The jeffreys_test metric evaluates model calibration using a Bayesian approach with Jeffreys prior (Beta(0.5, 0.5)) to assess whether predicted probabilities are consistent with observed default rates.
Metric Type: jeffreys_test
Calibration Assessment¶
The Jeffreys test computes a two-tailed p-value by:
- Using a Jeffreys prior Beta(0.5, 0.5) - adds 0.5 to successes and failures
- Creating a posterior distribution: Beta(defaults + 0.5, non-defaults + 0.5)
- Computing how likely the observed mean PD is under this posterior
- Calculating p-value: 2 × min(F(x), 1-F(x)) where F is the Beta CDF
Configuration Fields¶
Record-Level Data Format¶
For individual loan/account records:
metrics:
calibration_test:
metric_type: "jeffreys_test"
config:
name: ["model_calibration"]
data_format: "record_level"
prob_def: "predicted_probability" # Column with predicted probabilities (0.0-1.0)
default: "default_flag" # Column with default indicators (0/1 or boolean)
segment: [["product_type"]] # Optional: segmentation columns
dataset: "loan_portfolio"
Summary-Level Data Format¶
For pre-aggregated data:
metrics:
summary_calibration:
metric_type: "jeffreys_test"
config:
name: ["aggregated_calibration"]
data_format: "summary_level"
mean_pd: "avg_probability" # Column with mean probabilities (0.0-1.0)
defaults: "default_count" # Column with default counts
volume: "total_count" # Column with total observation counts
segment: [["risk_grade"]] # Optional: segmentation columns
dataset: "risk_summary"
Required Fields by Format¶
Record-Level Required¶
name: Metric name(s)data_format: Must be "record_level"prob_def: Probability column namedefault: Default indicator column namedataset: Dataset reference
Summary-Level Required¶
name: Metric name(s)data_format: Must be "summary_level"mean_pd: Mean probability column namedefaults: Default count column namevolume: Volume count column namedataset: Dataset reference
Optional Fields¶
segment: List of column names for grouping
Output Columns¶
The metric produces the following output columns:
group_key: Segmentation group identifier (struct of segment values)volume: Total number of observationsdefaults: Total number of defaultspd: Mean Predicted Default probabilitypvalue: Jeffreys test p-value (0.0 to 1.0)
Fan-out Examples¶
Multiple Calibration Tests¶
metrics:
model_calibration:
metric_type: "jeffreys_test"
config:
name:
["overall_calibration", "segment_calibration", "product_calibration"]
segment: [null, ["customer_segment"], ["product_type"]]
data_format: "record_level"
prob_def: "model_score"
default: "default_indicator"
dataset: "validation_data"
This creates three calibration tests:
- Overall portfolio calibration
- Calibration by customer segment
- Calibration by product type
Mixed Data Formats¶
metrics:
detailed_calibration:
metric_type: "jeffreys_test"
config:
name: ["record_level_calibration"]
data_format: "record_level"
prob_def: "probability"
default: "default"
dataset: "detailed_data"
summary_calibration:
metric_type: "jeffreys_test"
config:
name: ["summary_calibration"]
data_format: "summary_level"
mean_pd: "mean_prob"
defaults: "def_count"
volume: "vol_count"
dataset: "summary_data"
Interpretation¶
P-value Guidelines¶
- High p-value (≥ 0.05): Good calibration - predicted probabilities consistent with observed rates
- Low p-value (< 0.05): Poor calibration - significant difference between predicted and observed rates
- Very low p-value (< 0.01): Very poor calibration - substantial miscalibration
Calibration Quality¶
- Well-calibrated models have p-values > 0.05
- Models requiring recalibration typically have p-values < 0.05
- P-values near 0.5 indicate excellent calibration
Data Requirements¶
Record-Level Data¶
- One row per loan/account
- Probability column: numeric values between 0.0 and 1.0
- Default column: binary values (0/1 or boolean)
Summary-Level Data¶
- One row per group/segment
- Mean probability: numeric values between 0.0 and 1.0
- Default counts: positive numbers or None (negative values not allowed)
- Volume counts: positive numbers or None (negative values not allowed)