Skip to content

Shapiro-Wilk Normality Test

The shapiro_wilk metric performs the Shapiro-Wilk test to assess whether a dataset follows a normal distribution. This test is considered one of the most powerful normality tests, especially for small to medium sample sizes, and is essential for validating assumptions in statistical modeling.

Metric Type: shapiro_wilk

Shapiro-Wilk Test

The Shapiro-Wilk test evaluates the null hypothesis that data comes from a normal distribution. It calculates a test statistic (W) that measures how well the data fits a normal distribution pattern.

Test Statistic Range: 0.0 to 1.0 (higher values indicate more normal-like data)

Hypotheses: - H0 (Null): The data follows a normal distribution
- H1 (Alternative): The data does not follow a normal distribution

Decision Rule: If p-value < α (significance level), reject H0 and conclude the data is not normally distributed.

Configuration Fields

Record-Level Data Format

For testing normality of individual observations:

collections:
  residual_normality:
    metrics:
    - name:
      - model_residual_test
      data_format: record
      data_column: residuals
      segment:
      - - model_version
      metric_type: shapiro_wilk
    dataset: model_validation_data

Summary-Level Data Format

For aggregating pre-computed Shapiro-Wilk test statistics:

collections:
  aggregated_normality:
    metrics:
    - name:
      - combined_normality_test
      data_format: summary
      volume: sample_size
      statistic: w_statistic
      p_value: p_values
      segment:
      - - data_source
      metric_type: shapiro_wilk
    dataset: normality_results

Required Fields by Format

Record-Level Required

  • name: Metric name(s)
  • data_format: Must be "record"
  • data_column: Column name containing numeric data to test
  • dataset: Dataset reference

Summary-Level Required

  • name: Metric name(s)
  • data_format: Must be "summary"
  • volume: Volume count column name
  • statistic: Shapiro-Wilk W statistic column name
  • p_value: P-value column name
  • dataset: Dataset reference

Optional Fields

  • segment: List of column names for grouping

Output Columns

The metric produces the following output columns:

  • group_key: Segmentation group identifier (struct of segment values)
  • volume: Number of observations tested
  • statistic: Shapiro-Wilk W statistic (0.0-1.0, higher = more normal)
  • p_value: Statistical significance of the test (0.0-1.0)

Fan-out Examples

Single Configuration

collections:
  basic_normality:
    metrics:
    - name:
      - data_normality
      data_format: record
      data_column: values
      metric_type: shapiro_wilk
    dataset: analysis_data

Segmented Analysis

collections:
  segmented_normality:
    metrics:
    - name:
      - regional_normality
      - product_normality
      data_format: record
      data_column: residuals
      segment:
      - - region
      - - product_type
      metric_type: shapiro_wilk
    dataset: model_diagnostics

Mixed Data Formats

collections:
  detailed_normality:
    metrics:
    - name:
      - record_normality
      data_format: record
      data_column: raw_values
      metric_type: shapiro_wilk
    dataset: detailed_data
  summary_normality:
    metrics:
    - name:
      - summary_normality
      data_format: summary
      volume: n_obs
      statistic: w_stat
      p_value: p_val
      metric_type: shapiro_wilk
    dataset: aggregated_results

Data Requirements

Record-Level Data

  • One row per observation
  • Data column: numeric values (any scale, missing values automatically excluded)
  • Minimum: 3 observations per group (Shapiro-Wilk requirement)
  • Maximum: 5000 observations per group (scipy implementation limit)
  • Optimal: 20-500 observations for best statistical power

Summary-Level Data

  • One row per group/segment
  • Volume: positive integers (number of observations tested)
  • Statistic: numeric values between 0.0 and 1.0
  • P-value: numeric values between 0.0 and 1.0

Interpretation Guidelines

Significance Levels

Choose your alpha (α) based on application requirements:

Alpha Value Use Case Interpretation
0.01 Critical applications Very strict normality requirement
0.05 Standard practice Typical statistical significance
0.10 Exploratory analysis Lenient normality assessment

Decision Framework

# Example interpretation logic in analysis recipe
interpretation_rules:
  highly_normal: "p_value >= 0.10"  # Strong evidence for normality
  possibly_normal: "0.05 <= p_value < 0.10"  # Weak evidence against normality  
  likely_non_normal: "0.01 <= p_value < 0.05"  # Moderate evidence against normality
  definitely_non_normal: "p_value < 0.01"  # Strong evidence against normality

Practical Guidelines

W Statistic Range P-value Range Interpretation Recommended Action
0.95 - 1.00 p ≥ 0.10 Highly normal Proceed with parametric methods
0.90 - 0.95 0.05 ≤ p < 0.10 Possibly normal Use with caution, consider diagnostics
0.80 - 0.90 0.01 ≤ p < 0.05 Likely non-normal Consider transformations
< 0.80 p < 0.01 Definitely non-normal Use non-parametric methods

Use Cases in Credit Risk Modeling

1. Model Residual Analysis

Test normality assumptions for regression models:

collections:
  residual_normality:
    metrics:
    - name:
      - pd_model_residuals
      data_format: record
      data_column: model_residuals
      segment:
      - - model_version
        - time_period
      metric_type: shapiro_wilk
    dataset: model_validation

2. Pre-Test Validation

Validate assumptions before applying parametric statistical tests:

collections:
  pretest_normality:
    metrics:
    - name:
      - data_assumption_check
      data_format: record
      data_column: loss_rates
      segment:
      - - product_type
      metric_type: shapiro_wilk
    dataset: lgd_analysis

3. Distribution Monitoring

Monitor whether data distributions remain stable over time:

collections:
  distribution_stability:
    metrics:
    - name:
      - monthly_distribution_check
      data_format: record
      data_column: probability_scores
      segment:
      - - month
        - portfolio
      metric_type: shapiro_wilk
    dataset: monthly_monitoring

Sample Size Considerations

Minimum Requirements

# Configuration with sample size validation
validation_rules:
  minimum_observations: 3  # Shapiro-Wilk requirement
  recommended_minimum: 20  # For reliable results
  maximum_observations: 5000  # scipy limitation

Handling Large Datasets

For datasets larger than 5000 observations:

  1. Random Sampling: Take representative samples
  2. Segmentation: Break into smaller homogeneous groups
  3. Alternative Tests: Consider Anderson-Darling or Kolmogorov-Smirnov
  4. Visual Methods: Use Q-Q plots for large datasets

Error Handling

The metric handles edge cases gracefully:

  • Insufficient Data: Returns null values when n < 3
  • Excessive Data: Returns null values when n > 5000
  • Constant Values: Handles constant data that may cause scipy warnings
  • Missing Values: Automatically excludes null/missing observations

Advanced Configuration

Custom Validation Recipe

datasets:
  model_data:
    type: csv
    source: model_validation.csv
interpretation:
  normality_threshold: 0.05
  action_required: p_value < normality_threshold
  recommended_methods:
    normal_data: parametric_tests
    non_normal_data: non_parametric_tests
collections:
  comprehensive_normality:
    metrics:
    - name:
      - residual_normality
      - score_normality
      - exposure_normality
      data_format: record
      data_column:
      - model_residuals
      - risk_scores
      - log_exposure
      segment:
      - - model_type
      - - model_type
      - - model_type
      metric_type: shapiro_wilk
    dataset: model_data

Important Notes

  1. Sample Size Sensitivity: Very large samples may reject normality for trivial deviations
  2. Outlier Sensitivity: Extreme values strongly influence the test
  3. Multiple Testing: Adjust significance levels when testing multiple groups
  4. Complementary Analysis: Use with visual diagnostics (Q-Q plots, histograms)
  5. Domain Context: Consider practical significance alongside statistical significance
  6. Alternative Methods: Have non-parametric alternatives ready when normality is rejected

Performance Considerations

  • Record-Level: Efficient for datasets up to 5000 observations per group
  • Summary-Level: Very fast for pre-computed statistics
  • Segmentation: Parallel processing across segments
  • Memory Usage: Minimal memory footprint due to streaming computation