Binomial Test Metric¶
The binomial_test metric performs statistical hypothesis testing to determine whether an observed default rate significantly differs from an expected probability under the null hypothesis. This test is particularly valuable for credit risk modeling to validate whether portfolio default rates align with model expectations.
Metric Type: binomial_test
Binomial Test Calculation¶
The binomial test evaluates the null hypothesis that the observed proportion of defaults equals the expected probability:
- H₀: p = p₀ (observed default rate equals expected rate)
- H₁: p ≠ p₀ (observed default rate differs from expected rate)
Where:
- p = True population default probability
- p₀ = Expected default probability under null hypothesis
- The test uses the exact binomial distribution to calculate p-values
The binomial test is non-parametric and provides exact p-values regardless of sample size, making it suitable for small portfolios or segments where normal approximations may be inappropriate.
Configuration Fields¶
Record-Level Data Format¶
For individual account records with default indicators:
collections:
default_rate_test:
metrics:
- name:
- portfolio_default_test
data_format: record
default: default_flag
expected_probability: 0.05
segment:
- - risk_grade
metric_type: binomial_test
dataset: loan_portfolio
Summary-Level Data Format¶
For pre-aggregated default count data:
collections:
aggregated_default_test:
metrics:
- name:
- segment_default_test
data_format: summary
volume: total_accounts
defaults: default_count
expected_probability: 0.03
segment:
- - portfolio_segment
metric_type: binomial_test
dataset: portfolio_summary
Required Fields by Format¶
Record-Level Required¶
name: Metric name(s)data_format: Must be "record"default: Default indicator column name (binary: 0/1 or boolean)expected_probability: Expected default rate (float between 0.0 and 1.0)dataset: Dataset reference
Summary-Level Required¶
name: Metric name(s)data_format: Must be "summary"volume: Total account count column namedefaults: Default count column nameexpected_probability: Expected default rate (float between 0.0 and 1.0)dataset: Dataset reference
Optional Fields¶
segment: List of column names for grouping
Output Columns¶
The metric produces the following output columns:
group_key: Segmentation group identifier (struct of segment values)volume: Total number of observationsdefaults: Number of observed defaultsobserved_probability: Observed default rate (defaults/volume)expected_probability: Expected default rate under null hypothesisp_value: Two-tailed p-value from exact binomial test
Fan-out Examples¶
Single Portfolio Test¶
collections:
portfolio_test:
metrics:
- name:
- overall_default_test
data_format: record
default: default_indicator
expected_probability: 0.04
metric_type: binomial_test
dataset: quarterly_portfolio
Segmented Analysis¶
collections:
segmented_tests:
metrics:
- name:
- grade_test
- region_test
- product_test
data_format: record
default: default_flag
expected_probability: 0.06
segment:
- - risk_grade
- - region
- - product_type
metric_type: binomial_test
dataset: validation_data
Mixed Data Formats¶
collections:
detailed_test:
metrics:
- name:
- account_level_test
data_format: record
default: default_indicator
expected_probability: 0.025
metric_type: binomial_test
dataset: account_data
summary_test:
metrics:
- name:
- portfolio_summary_test
data_format: summary
volume: account_count
defaults: default_count
expected_probability: 0.025
metric_type: binomial_test
dataset: portfolio_aggregates
Time Series Validation¶
collections:
monthly_validation:
metrics:
- name:
- jan_test
- feb_test
- mar_test
data_format: summary
volume: monthly_accounts
defaults: monthly_defaults
expected_probability: 0.035
segment:
- - month_jan
- - month_feb
- - month_mar
metric_type: binomial_test
dataset: monthly_summary
Data Requirements¶
Record-Level Data¶
- One row per account/observation
- Default column: binary indicators (0/1, true/false, or boolean)
- Default values must be exactly 0 or 1 (no missing values)
- Volume automatically calculated as row count
Summary-Level Data¶
- One row per group/segment
- Volume counts: positive integers (total accounts in segment)
- Default counts: non-negative integers ≤ volume
- Must satisfy: 0 ≤ defaults ≤ volume for each row
Statistical Interpretation¶
P-Value Guidelines¶
- p < 0.001: Highly significant difference from expected rate
- 0.001 ≤ p < 0.01: Very significant difference
- 0.01 ≤ p < 0.05: Significant difference (commonly used threshold)
- 0.05 ≤ p < 0.10: Marginally significant
- p ≥ 0.10: No significant difference from expected rate
Practical Significance¶
Consider both statistical and practical significance:
- Small p-values don't always indicate practically important differences
- Large sample sizes can detect trivial differences as statistically significant
- Consider confidence intervals and effect sizes alongside p-values
Credit Risk Applications¶
Model Validation¶
- Test if observed default rates match model predictions
- Validate PD model calibration across risk grades
- Assess model performance over different time periods
Regulatory Compliance¶
- IFRS 9 model validation requirements
- Basel III capital adequacy assessments
- Stress testing default rate assumptions
Portfolio Monitoring¶
- Detect early warning signals of deteriorating credit quality
- Monitor default rates against business plan assumptions
- Assess impact of economic conditions on default behavior
Important Notes¶
- Exact Test: Uses exact binomial distribution (not normal approximation)
- Two-Tailed: Tests for any significant difference (higher or lower than expected)
- Sample Size: No minimum sample size requirement (works with small portfolios)
- Independence: Assumes observations are independent
- Expected Rate: Must be between 0.0 and 1.0 (0% to 100%)
- Data Quality: Ensure default indicators are correctly coded (0/1 only)
- Missing Data: Remove missing values before testing
- Multiple Testing: Consider adjusting significance levels when testing multiple segments
Example Credit Risk Use Cases¶
PD Model Validation¶
# Test if observed defaults match PD model predictions by risk grade
collections:
pd_validation:
dataset: model_validation_data
metrics:
- metric_type: binomial_test
data_format: record
name: pd_grade_validation
default: default_12m
expected_probability: 0.08 # Model predicted 8% default rate
segment: ["internal_rating"]
Economic Stress Testing¶
# Test if stress scenario default rates are significantly different from baseline
collections:
stress_test:
dataset: stress_test_results
metrics:
- metric_type: binomial_test
data_format: summary
name: baseline_vs_stress
volume: portfolio_size
defaults: stress_defaults
expected_probability: 0.045 # Baseline default rate
segment: ["scenario"]
Portfolio Drift Monitoring¶
# Monitor if current portfolio default behavior differs from historical patterns
collections:
drift_monitoring:
dataset: quarterly_performance
metrics:
- metric_type: binomial_test
data_format: record
name: quarterly_drift_test
default: default_flag
expected_probability: 0.055 # Historical average default rate
segment: [["quarter", "product_line"]]