Normality Testing Metrics¶
This module provides statistical tests for assessing whether data follows a normal distribution, which is a key assumption for many statistical methods and model validation techniques.
Available Functions¶
Shapiro-Wilk Test¶
Usage Examples¶
Record-Level Data¶
Test normality of individual observations:
import polars as pl
from tnp_statistic_library.metrics.normality import shapiro_wilk
# Create sample data
df = pl.DataFrame({
"residuals": [0.1, -0.2, 0.05, 0.3, -0.1, 0.2, -0.05, 0.15],
"model_version": ["v1", "v1", "v1", "v1", "v2", "v2", "v2", "v2"]
})
# Test normality of residuals by model version
result = shapiro_wilk(
data=df,
data_format="record",
data_column="residuals",
segment=["model_version"]
)
print(result)
Summary-Level Data¶
Work with pre-computed normality test statistics:
# Pre-aggregated normality test results
df_summary = pl.DataFrame({
"volume": [100, 150, 80],
"statistic": [0.95, 0.88, 0.92],
"p_value": [0.08, 0.02, 0.05],
"data_source": ["training", "validation", "test"]
})
# Aggregate normality results across data sources
result = shapiro_wilk(
data=df_summary,
data_format="summary",
volume="volume",
statistic="statistic",
p_value="p_value",
segment=["data_source"]
)
Data Format Requirements¶
Record-Level Data¶
For testing normality of individual observations:
- data_column: Column containing numeric values to test for normality
- Optional: segment columns for group-wise testing
- Minimum: 3 observations per group (Shapiro-Wilk requirement)
- Maximum: 5000 observations per group (scipy limitation)
Summary-Level Data¶
For aggregating pre-computed test statistics:
- volume: Number of observations used in the original test
- statistic: The W statistic from Shapiro-Wilk test (0.0-1.0)
- p_value: The p-value from the normality test (0.0-1.0)
- Optional: segment columns for group identification
Output Columns¶
All normality functions return:
- group_key: Segmentation group identifier (struct of segment values)
- volume: Number of observations tested
- statistic: Test statistic value (higher values indicate more normal-like data)
- p_value: Statistical significance of the test
Interpretation Guidelines¶
Shapiro-Wilk Test Results¶
- Null Hypothesis (H0): The data follows a normal distribution
- Alternative Hypothesis (H1): The data does not follow a normal distribution
Decision Rules¶
Choose your significance level (alpha) based on your requirements:
- p_value < alpha: Reject H0 - Evidence against normality
- p_value >= alpha: Fail to reject H0 - Insufficient evidence against normality
Common Alpha Values¶
- 0.05 (5%): Standard significance level for most applications
- 0.01 (1%): More stringent threshold for critical applications
- 0.10 (10%): More lenient threshold for exploratory analysis
Practical Guidelines¶
| p-value Range | Interpretation | Action |
|---|---|---|
| p ≥ 0.10 | Strong evidence for normality | Safe to assume normality |
| 0.05 ≤ p < 0.10 | Weak evidence against normality | Consider normality assumption carefully |
| 0.01 ≤ p < 0.05 | Moderate evidence against normality | Likely not normal, consider alternatives |
| p < 0.01 | Strong evidence against normality | Data is not normal, use non-parametric methods |
Limitations and Considerations¶
Sample Size Constraints¶
- Minimum: 3 observations required for Shapiro-Wilk test
- Maximum: 5000 observations (scipy implementation limit)
- Optimal: 20-500 observations for best test power
Sensitivity Factors¶
- Outliers: Shapiro-Wilk is sensitive to extreme values
- Ties: Repeated values can affect test performance
- Sample Size: Very large samples may reject normality for trivial deviations
When to Use Normality Tests¶
Recommended Use Cases¶
- Pre-analysis Validation: Before applying parametric statistical tests
- Model Residual Analysis: Testing assumptions for regression models
- Quality Control: Monitoring data distribution consistency
- Method Selection: Choosing between parametric and non-parametric approaches
Alternative Approaches¶
If normality is rejected, consider:
- Visual Methods: Q-Q plots, histograms, density plots
- Robust Statistics: Methods that don't assume normality
- Data Transformation: Log, square root, or Box-Cox transformations
- Non-parametric Tests: Wilcoxon, Mann-Whitney, Kruskal-Wallis
Best Practices¶
Data Preparation¶
- Remove Outliers: Consider outlier treatment before testing
- Sufficient Sample Size: Ensure adequate observations for reliable results
- Segmentation Strategy: Test normality within homogeneous groups
- Missing Data: Handle missing values appropriately
Result Interpretation¶
- Multiple Comparisons: Adjust significance levels when testing multiple groups
- Practical Significance: Consider effect size, not just statistical significance
- Domain Context: Apply domain knowledge to interpretation
- Complementary Analysis: Use with visual diagnostic tools
Common Applications in Credit Risk¶
# Example: Testing normality of model residuals
residuals_df = pl.DataFrame({
"residuals": model_residuals,
"time_period": time_periods,
"product_type": product_types
})
# Test residual normality by product and time
normality_test = shapiro_wilk(
data=residuals_df,
data_format="record",
data_column="residuals",
segment=["product_type", "time_period"]
)
results = normality_test
# Check which segments fail normality assumption
non_normal_segments = results.filter(pl.col("p_value") < 0.05)
This documentation provides comprehensive guidance for using normality testing metrics to validate statistical assumptions and ensure appropriate method selection in your analysis recipes.