Shapiro-Wilk Normality Test¶
The shapiro_wilk metric performs the Shapiro-Wilk test to assess whether a dataset follows a normal distribution. This test is considered one of the most powerful normality tests, especially for small to medium sample sizes, and is essential for validating assumptions in statistical modeling.
Metric Type: shapiro_wilk
Shapiro-Wilk Test¶
The Shapiro-Wilk test evaluates the null hypothesis that data comes from a normal distribution. It calculates a test statistic (W) that measures how well the data fits a normal distribution pattern.
Test Statistic Range: 0.0 to 1.0 (higher values indicate more normal-like data)
Hypotheses:
- H0 (Null): The data follows a normal distribution
- H1 (Alternative): The data does not follow a normal distribution
Decision Rule: If p-value < α (significance level), reject H0 and conclude the data is not normally distributed.
Configuration Fields¶
Record-Level Data Format¶
For testing normality of individual observations:
collections:
residual_normality:
metrics:
- name:
- model_residual_test
data_format: record
data_column: residuals
segment:
- - model_version
metric_type: shapiro_wilk
dataset: model_validation_data
Summary-Level Data Format¶
For aggregating pre-computed Shapiro-Wilk test statistics:
collections:
aggregated_normality:
metrics:
- name:
- combined_normality_test
data_format: summary
volume: sample_size
statistic: w_statistic
p_value: p_values
segment:
- - data_source
metric_type: shapiro_wilk
dataset: normality_results
Required Fields by Format¶
Record-Level Required¶
name: Metric name(s)data_format: Must be "record"data_column: Column name containing numeric data to testdataset: Dataset reference
Summary-Level Required¶
name: Metric name(s)data_format: Must be "summary"volume: Volume count column namestatistic: Shapiro-Wilk W statistic column namep_value: P-value column namedataset: Dataset reference
Optional Fields¶
segment: List of column names for grouping
Output Columns¶
The metric produces the following output columns:
group_key: Segmentation group identifier (struct of segment values)volume: Number of observations testedstatistic: Shapiro-Wilk W statistic (0.0-1.0, higher = more normal)p_value: Statistical significance of the test (0.0-1.0)
Fan-out Examples¶
Single Configuration¶
collections:
basic_normality:
metrics:
- name:
- data_normality
data_format: record
data_column: values
metric_type: shapiro_wilk
dataset: analysis_data
Segmented Analysis¶
collections:
segmented_normality:
metrics:
- name:
- regional_normality
- product_normality
data_format: record
data_column: residuals
segment:
- - region
- - product_type
metric_type: shapiro_wilk
dataset: model_diagnostics
Mixed Data Formats¶
collections:
detailed_normality:
metrics:
- name:
- record_normality
data_format: record
data_column: raw_values
metric_type: shapiro_wilk
dataset: detailed_data
summary_normality:
metrics:
- name:
- summary_normality
data_format: summary
volume: n_obs
statistic: w_stat
p_value: p_val
metric_type: shapiro_wilk
dataset: aggregated_results
Data Requirements¶
Record-Level Data¶
- One row per observation
- Data column: numeric values (any scale, missing values automatically excluded)
- Minimum: 3 observations per group (Shapiro-Wilk requirement)
- Maximum: 5000 observations per group (scipy implementation limit)
- Optimal: 20-500 observations for best statistical power
Summary-Level Data¶
- One row per group/segment
- Volume: positive integers (number of observations tested)
- Statistic: numeric values between 0.0 and 1.0
- P-value: numeric values between 0.0 and 1.0
Interpretation Guidelines¶
Significance Levels¶
Choose your alpha (α) based on application requirements:
| Alpha Value | Use Case | Interpretation |
|---|---|---|
| 0.01 | Critical applications | Very strict normality requirement |
| 0.05 | Standard practice | Typical statistical significance |
| 0.10 | Exploratory analysis | Lenient normality assessment |
Decision Framework¶
# Example interpretation logic in analysis recipe
interpretation_rules:
highly_normal: "p_value >= 0.10" # Strong evidence for normality
possibly_normal: "0.05 <= p_value < 0.10" # Weak evidence against normality
likely_non_normal: "0.01 <= p_value < 0.05" # Moderate evidence against normality
definitely_non_normal: "p_value < 0.01" # Strong evidence against normality
Practical Guidelines¶
| W Statistic Range | P-value Range | Interpretation | Recommended Action |
|---|---|---|---|
| 0.95 - 1.00 | p ≥ 0.10 | Highly normal | Proceed with parametric methods |
| 0.90 - 0.95 | 0.05 ≤ p < 0.10 | Possibly normal | Use with caution, consider diagnostics |
| 0.80 - 0.90 | 0.01 ≤ p < 0.05 | Likely non-normal | Consider transformations |
| < 0.80 | p < 0.01 | Definitely non-normal | Use non-parametric methods |
Use Cases in Credit Risk Modeling¶
1. Model Residual Analysis¶
Test normality assumptions for regression models:
collections:
residual_normality:
metrics:
- name:
- pd_model_residuals
data_format: record
data_column: model_residuals
segment:
- - model_version
- time_period
metric_type: shapiro_wilk
dataset: model_validation
2. Pre-Test Validation¶
Validate assumptions before applying parametric statistical tests:
collections:
pretest_normality:
metrics:
- name:
- data_assumption_check
data_format: record
data_column: loss_rates
segment:
- - product_type
metric_type: shapiro_wilk
dataset: lgd_analysis
3. Distribution Monitoring¶
Monitor whether data distributions remain stable over time:
collections:
distribution_stability:
metrics:
- name:
- monthly_distribution_check
data_format: record
data_column: probability_scores
segment:
- - month
- portfolio
metric_type: shapiro_wilk
dataset: monthly_monitoring
Sample Size Considerations¶
Minimum Requirements¶
# Configuration with sample size validation
validation_rules:
minimum_observations: 3 # Shapiro-Wilk requirement
recommended_minimum: 20 # For reliable results
maximum_observations: 5000 # scipy limitation
Handling Large Datasets¶
For datasets larger than 5000 observations:
- Random Sampling: Take representative samples
- Segmentation: Break into smaller homogeneous groups
- Alternative Tests: Consider Anderson-Darling or Kolmogorov-Smirnov
- Visual Methods: Use Q-Q plots for large datasets
Error Handling¶
The metric handles edge cases gracefully:
- Insufficient Data: Returns null values when n < 3
- Excessive Data: Returns null values when n > 5000
- Constant Values: Handles constant data that may cause scipy warnings
- Missing Values: Automatically excludes null/missing observations
Advanced Configuration¶
Custom Validation Recipe¶
datasets:
model_data:
type: csv
source: model_validation.csv
interpretation:
normality_threshold: 0.05
action_required: p_value < normality_threshold
recommended_methods:
normal_data: parametric_tests
non_normal_data: non_parametric_tests
collections:
comprehensive_normality:
metrics:
- name:
- residual_normality
- score_normality
- exposure_normality
data_format: record
data_column:
- model_residuals
- risk_scores
- log_exposure
segment:
- - model_type
- - model_type
- - model_type
metric_type: shapiro_wilk
dataset: model_data
Important Notes¶
- Sample Size Sensitivity: Very large samples may reject normality for trivial deviations
- Outlier Sensitivity: Extreme values strongly influence the test
- Multiple Testing: Adjust significance levels when testing multiple groups
- Complementary Analysis: Use with visual diagnostics (Q-Q plots, histograms)
- Domain Context: Consider practical significance alongside statistical significance
- Alternative Methods: Have non-parametric alternatives ready when normality is rejected
Performance Considerations¶
- Record-Level: Efficient for datasets up to 5000 observations per group
- Summary-Level: Very fast for pre-computed statistics
- Segmentation: Parallel processing across segments
- Memory Usage: Minimal memory footprint due to streaming computation