Skip to content

Binomial Test Metric

The binomial_test metric performs statistical hypothesis testing to determine whether an observed default rate significantly differs from an expected probability under the null hypothesis. This test is particularly valuable for credit risk modeling to validate whether portfolio default rates align with model expectations.

Metric Type: binomial_test

Binomial Test Calculation

The binomial test evaluates the null hypothesis that the observed proportion of defaults equals the expected probability:

  • H₀: p = p₀ (observed default rate equals expected rate)
  • H₁: p ≠ p₀ (observed default rate differs from expected rate)

Where:

  • p = True population default probability
  • p₀ = Expected default probability under null hypothesis
  • The test uses the exact binomial distribution to calculate p-values

The binomial test is non-parametric and provides exact p-values regardless of sample size, making it suitable for small portfolios or segments where normal approximations may be inappropriate.

Configuration Fields

Record-Level Data Format

For individual account records with default indicators:

metrics:
  default_rate_test:
    metric_type: "binomial_test"
    config:
      name: ["portfolio_default_test"]
      data_format: "record_level"
      default: "default_flag" # Column with default indicators (0/1 or boolean)
      expected_probability: 0.05 # Expected default rate (5%)
      segment: [["risk_grade"]] # Optional: segmentation columns
      dataset: "loan_portfolio"

Summary-Level Data Format

For pre-aggregated default count data:

metrics:
  aggregated_default_test:
    metric_type: "binomial_test"
    config:
      name: ["segment_default_test"]
      data_format: "summary_level"
      volume: "total_accounts" # Column with total account counts
      defaults: "default_count" # Column with default counts
      expected_probability: 0.03 # Expected default rate (3%)
      segment: [["portfolio_segment"]] # Optional: segmentation columns
      dataset: "portfolio_summary"

Required Fields by Format

Record-Level Required

  • name: Metric name(s)
  • data_format: Must be "record_level"
  • default: Default indicator column name (binary: 0/1 or boolean)
  • expected_probability: Expected default rate (float between 0.0 and 1.0)
  • dataset: Dataset reference

Summary-Level Required

  • name: Metric name(s)
  • data_format: Must be "summary_level"
  • volume: Total account count column name
  • defaults: Default count column name
  • expected_probability: Expected default rate (float between 0.0 and 1.0)
  • dataset: Dataset reference

Optional Fields

  • segment: List of column names for grouping

Output Columns

The metric produces the following output columns:

  • group_key: Segmentation group identifier (struct of segment values)
  • volume: Total number of observations
  • defaults: Number of observed defaults
  • observed_probability: Observed default rate (defaults/volume)
  • expected_probability: Expected default rate under null hypothesis
  • p_value: Two-tailed p-value from exact binomial test

Fan-out Examples

Single Portfolio Test

metrics:
  portfolio_test:
    metric_type: "binomial_test"
    config:
      name: ["overall_default_test"]
      data_format: "record_level"
      default: "default_indicator"
      expected_probability: 0.04
      dataset: "quarterly_portfolio"

Segmented Analysis

metrics:
  segmented_tests:
    metric_type: "binomial_test"
    config:
      name: ["grade_test", "region_test", "product_test"]
      data_format: "record_level"
      default: "default_flag"
      expected_probability: 0.06
      segment: [["risk_grade"], ["region"], ["product_type"]]
      dataset: "validation_data"

Mixed Data Formats

metrics:
  detailed_test:
    metric_type: "binomial_test"
    config:
      name: ["account_level_test"]
      data_format: "record_level"
      default: "default_indicator"
      expected_probability: 0.025
      dataset: "account_data"

  summary_test:
    metric_type: "binomial_test"
    config:
      name: ["portfolio_summary_test"]
      data_format: "summary_level"
      volume: "account_count"
      defaults: "default_count"
      expected_probability: 0.025
      dataset: "portfolio_aggregates"

Time Series Validation

metrics:
  monthly_validation:
    metric_type: "binomial_test"
    config:
      name: ["jan_test", "feb_test", "mar_test"]
      data_format: "summary_level"
      volume: "monthly_accounts"
      defaults: "monthly_defaults"
      expected_probability: 0.035
      segment: [["month_jan"], ["month_feb"], ["month_mar"]]
      dataset: "monthly_summary"

Data Requirements

Record-Level Data

  • One row per account/observation
  • Default column: binary indicators (0/1, true/false, or boolean)
  • Default values must be exactly 0 or 1 (no missing values)
  • Volume automatically calculated as row count

Summary-Level Data

  • One row per group/segment
  • Volume counts: positive integers (total accounts in segment)
  • Default counts: non-negative integers ≤ volume
  • Must satisfy: 0 ≤ defaults ≤ volume for each row

Statistical Interpretation

P-Value Guidelines

  • p < 0.001: Highly significant difference from expected rate
  • 0.001 ≤ p < 0.01: Very significant difference
  • 0.01 ≤ p < 0.05: Significant difference (commonly used threshold)
  • 0.05 ≤ p < 0.10: Marginally significant
  • p ≥ 0.10: No significant difference from expected rate

Practical Significance

Consider both statistical and practical significance:

  • Small p-values don't always indicate practically important differences
  • Large sample sizes can detect trivial differences as statistically significant
  • Consider confidence intervals and effect sizes alongside p-values

Credit Risk Applications

Model Validation

  • Test if observed default rates match model predictions
  • Validate PD model calibration across risk grades
  • Assess model performance over different time periods

Regulatory Compliance

  • IFRS 9 model validation requirements
  • Basel III capital adequacy assessments
  • Stress testing default rate assumptions

Portfolio Monitoring

  • Detect early warning signals of deteriorating credit quality
  • Monitor default rates against business plan assumptions
  • Assess impact of economic conditions on default behavior

Important Notes

  1. Exact Test: Uses exact binomial distribution (not normal approximation)
  2. Two-Tailed: Tests for any significant difference (higher or lower than expected)
  3. Sample Size: No minimum sample size requirement (works with small portfolios)
  4. Independence: Assumes observations are independent
  5. Expected Rate: Must be between 0.0 and 1.0 (0% to 100%)
  6. Data Quality: Ensure default indicators are correctly coded (0/1 only)
  7. Missing Data: Remove missing values before testing
  8. Multiple Testing: Consider adjusting significance levels when testing multiple segments

Example Credit Risk Use Cases

PD Model Validation

# Test if observed defaults match PD model predictions by risk grade
pd_validation:
  metric_type: "binomial_test"
  config:
    name: ["pd_grade_validation"]
    data_format: "record_level"
    default: "default_12m"
    expected_probability: 0.08  # Model predicted 8% default rate
    segment: [["internal_rating"]]
    dataset: "model_validation_data"

Economic Stress Testing

# Test if stress scenario default rates are significantly different from baseline
stress_test:
  metric_type: "binomial_test"
  config:
    name: ["baseline_vs_stress"]
    data_format: "summary_level"
    volume: "portfolio_size"
    defaults: "stress_defaults"
    expected_probability: 0.045  # Baseline default rate
    segment: [["scenario"]]
    dataset: "stress_test_results"

Portfolio Drift Monitoring

# Monitor if current portfolio default behavior differs from historical patterns
drift_monitoring:
  metric_type: "binomial_test"
  config:
    name: ["quarterly_drift_test"]
    data_format: "record_level"
    default: "default_flag"
    expected_probability: 0.055  # Historical average default rate
    segment: [["quarter"], ["product_line"]]
    dataset: "quarterly_performance"