Binomial Test Metric¶
The binomial_test metric performs statistical hypothesis testing to determine whether an observed default rate significantly differs from an expected probability under the null hypothesis. This test is particularly valuable for credit risk modeling to validate whether portfolio default rates align with model expectations.
Metric Type: binomial_test
Binomial Test Calculation¶
The binomial test evaluates the null hypothesis that the observed proportion of defaults equals the expected probability:
- H₀: p = p₀ (observed default rate equals expected rate)
- H₁: p ≠ p₀ (observed default rate differs from expected rate)
Where:
- p = True population default probability
- p₀ = Expected default probability under null hypothesis
- The test uses the exact binomial distribution to calculate p-values
The binomial test is non-parametric and provides exact p-values regardless of sample size, making it suitable for small portfolios or segments where normal approximations may be inappropriate.
Configuration Fields¶
Record-Level Data Format¶
For individual account records with default indicators:
metrics:
default_rate_test:
metric_type: "binomial_test"
config:
name: ["portfolio_default_test"]
data_format: "record_level"
default: "default_flag" # Column with default indicators (0/1 or boolean)
expected_probability: 0.05 # Expected default rate (5%)
segment: [["risk_grade"]] # Optional: segmentation columns
dataset: "loan_portfolio"
Summary-Level Data Format¶
For pre-aggregated default count data:
metrics:
aggregated_default_test:
metric_type: "binomial_test"
config:
name: ["segment_default_test"]
data_format: "summary_level"
volume: "total_accounts" # Column with total account counts
defaults: "default_count" # Column with default counts
expected_probability: 0.03 # Expected default rate (3%)
segment: [["portfolio_segment"]] # Optional: segmentation columns
dataset: "portfolio_summary"
Required Fields by Format¶
Record-Level Required¶
name: Metric name(s)data_format: Must be "record_level"default: Default indicator column name (binary: 0/1 or boolean)expected_probability: Expected default rate (float between 0.0 and 1.0)dataset: Dataset reference
Summary-Level Required¶
name: Metric name(s)data_format: Must be "summary_level"volume: Total account count column namedefaults: Default count column nameexpected_probability: Expected default rate (float between 0.0 and 1.0)dataset: Dataset reference
Optional Fields¶
segment: List of column names for grouping
Output Columns¶
The metric produces the following output columns:
group_key: Segmentation group identifier (struct of segment values)volume: Total number of observationsdefaults: Number of observed defaultsobserved_probability: Observed default rate (defaults/volume)expected_probability: Expected default rate under null hypothesisp_value: Two-tailed p-value from exact binomial test
Fan-out Examples¶
Single Portfolio Test¶
metrics:
portfolio_test:
metric_type: "binomial_test"
config:
name: ["overall_default_test"]
data_format: "record_level"
default: "default_indicator"
expected_probability: 0.04
dataset: "quarterly_portfolio"
Segmented Analysis¶
metrics:
segmented_tests:
metric_type: "binomial_test"
config:
name: ["grade_test", "region_test", "product_test"]
data_format: "record_level"
default: "default_flag"
expected_probability: 0.06
segment: [["risk_grade"], ["region"], ["product_type"]]
dataset: "validation_data"
Mixed Data Formats¶
metrics:
detailed_test:
metric_type: "binomial_test"
config:
name: ["account_level_test"]
data_format: "record_level"
default: "default_indicator"
expected_probability: 0.025
dataset: "account_data"
summary_test:
metric_type: "binomial_test"
config:
name: ["portfolio_summary_test"]
data_format: "summary_level"
volume: "account_count"
defaults: "default_count"
expected_probability: 0.025
dataset: "portfolio_aggregates"
Time Series Validation¶
metrics:
monthly_validation:
metric_type: "binomial_test"
config:
name: ["jan_test", "feb_test", "mar_test"]
data_format: "summary_level"
volume: "monthly_accounts"
defaults: "monthly_defaults"
expected_probability: 0.035
segment: [["month_jan"], ["month_feb"], ["month_mar"]]
dataset: "monthly_summary"
Data Requirements¶
Record-Level Data¶
- One row per account/observation
- Default column: binary indicators (0/1, true/false, or boolean)
- Default values must be exactly 0 or 1 (no missing values)
- Volume automatically calculated as row count
Summary-Level Data¶
- One row per group/segment
- Volume counts: positive integers (total accounts in segment)
- Default counts: non-negative integers ≤ volume
- Must satisfy: 0 ≤ defaults ≤ volume for each row
Statistical Interpretation¶
P-Value Guidelines¶
- p < 0.001: Highly significant difference from expected rate
- 0.001 ≤ p < 0.01: Very significant difference
- 0.01 ≤ p < 0.05: Significant difference (commonly used threshold)
- 0.05 ≤ p < 0.10: Marginally significant
- p ≥ 0.10: No significant difference from expected rate
Practical Significance¶
Consider both statistical and practical significance:
- Small p-values don't always indicate practically important differences
- Large sample sizes can detect trivial differences as statistically significant
- Consider confidence intervals and effect sizes alongside p-values
Credit Risk Applications¶
Model Validation¶
- Test if observed default rates match model predictions
- Validate PD model calibration across risk grades
- Assess model performance over different time periods
Regulatory Compliance¶
- IFRS 9 model validation requirements
- Basel III capital adequacy assessments
- Stress testing default rate assumptions
Portfolio Monitoring¶
- Detect early warning signals of deteriorating credit quality
- Monitor default rates against business plan assumptions
- Assess impact of economic conditions on default behavior
Important Notes¶
- Exact Test: Uses exact binomial distribution (not normal approximation)
- Two-Tailed: Tests for any significant difference (higher or lower than expected)
- Sample Size: No minimum sample size requirement (works with small portfolios)
- Independence: Assumes observations are independent
- Expected Rate: Must be between 0.0 and 1.0 (0% to 100%)
- Data Quality: Ensure default indicators are correctly coded (0/1 only)
- Missing Data: Remove missing values before testing
- Multiple Testing: Consider adjusting significance levels when testing multiple segments
Example Credit Risk Use Cases¶
PD Model Validation¶
# Test if observed defaults match PD model predictions by risk grade
pd_validation:
metric_type: "binomial_test"
config:
name: ["pd_grade_validation"]
data_format: "record_level"
default: "default_12m"
expected_probability: 0.08 # Model predicted 8% default rate
segment: [["internal_rating"]]
dataset: "model_validation_data"
Economic Stress Testing¶
# Test if stress scenario default rates are significantly different from baseline
stress_test:
metric_type: "binomial_test"
config:
name: ["baseline_vs_stress"]
data_format: "summary_level"
volume: "portfolio_size"
defaults: "stress_defaults"
expected_probability: 0.045 # Baseline default rate
segment: [["scenario"]]
dataset: "stress_test_results"
Portfolio Drift Monitoring¶
# Monitor if current portfolio default behavior differs from historical patterns
drift_monitoring:
metric_type: "binomial_test"
config:
name: ["quarterly_drift_test"]
data_format: "record_level"
default: "default_flag"
expected_probability: 0.055 # Historical average default rate
segment: [["quarter"], ["product_line"]]
dataset: "quarterly_performance"