Normality Testing Metrics¶
This module provides statistical tests for assessing whether data follows a normal distribution, which is a key assumption for many statistical methods and model validation techniques.
Available Functions¶
Shapiro-Wilk Test¶
shapiro_wilk ¶
shapiro_wilk(
*,
name: str,
dataset: LazyFrame | DataFrame,
data_format: Literal["record_level", "summary_level"],
segment: SegmentCol = None,
**kwargs,
) -> ShapiroWilk
Compute the Shapiro-Wilk test for normality.
The Shapiro-Wilk test is a statistical test to assess whether a dataset follows a normal distribution. It is considered one of the most powerful normality tests, especially for small to medium sample sizes.
The test returns: - statistic: The test statistic (W), ranges from 0 to 1 - p_value: The p-value for the test - volume: The number of observations used in the test
Interpretation guidelines: - The null hypothesis (H0) assumes the data follows a normal distribution - The alternative hypothesis (H1) assumes the data does not follow a normal distribution - Compare p_value to your chosen significance level (alpha): * If p_value < alpha: Evidence against normality (reject H0) * If p_value >= alpha: Insufficient evidence against normality (fail to reject H0) - Common alpha values: 0.05 (5%), 0.01 (1%), or 0.10 (10%)
Limitations: - Requires at least 3 observations - Maximum sample size is 5000 (scipy limitation) - Sensitive to outliers and ties in the data
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name identifier for this metric instance. |
required |
dataset
|
LazyFrame | DataFrame
|
The input dataset as either a LazyFrame or DataFrame. |
required |
data_format
|
Literal['record_level', 'summary_level']
|
The format of the input data. |
required |
segment
|
SegmentCol
|
Optional list of column names to use for segmentation/grouping. |
None
|
**kwargs
|
Additional arguments based on data_format. |
{}
|
Record-level format args
data_column: The column containing the data to test for normality.
Summary-level format args
volume: The column containing the count of observations. statistic: The column containing pre-computed Shapiro-Wilk statistics. p_value: The column containing pre-computed p-values.
Returns:
| Type | Description |
|---|---|
ShapiroWilk
|
A ShapiroWilk metric instance ready for computation. |
Examples:
Record-level usage:
>>> import polars as pl
>>> from tnp_statistic_library.metrics.normality import shapiro_wilk
>>>
>>> # Create sample data
>>> df = pl.DataFrame({
... "values": [1.2, 1.1, 1.3, 1.0, 1.4, 1.2, 1.1, 1.5, 1.3, 1.2],
... "group": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"]
... })
>>>
>>> # Test normality for each group
>>> metric = shapiro_wilk(
... name="data_normality",
... dataset=df,
... data_format="record_level",
... data_column="values",
... segment=["group"]
... )
>>> result = metric.run_metric().collect()
Summary-level usage:
>>> df_summary = pl.DataFrame({
... "volume": [50, 45],
... "statistic": [0.95, 0.92],
... "p_value": [0.06, 0.03],
... "region": ["North", "South"]
... })
>>>
>>> metric = shapiro_wilk(
... name="regional_normality",
... dataset=df_summary,
... data_format="summary_level",
... volume="volume",
... statistic="statistic",
... p_value="p_value",
... segment=["region"]
... )
>>> result = metric.run_metric().collect()
Usage Examples¶
Record-Level Data¶
Test normality of individual observations:
import polars as pl
from tnp_statistic_library.metrics.normality import shapiro_wilk
# Create sample data
df = pl.DataFrame({
"residuals": [0.1, -0.2, 0.05, 0.3, -0.1, 0.2, -0.05, 0.15],
"model_version": ["v1", "v1", "v1", "v1", "v2", "v2", "v2", "v2"]
})
# Test normality of residuals by model version
result = shapiro_wilk(
name="residual_normality",
dataset=df,
data_format="record_level",
data_column="residuals",
segment=["model_version"]
).run_metric().collect()
print(result)
Summary-Level Data¶
Work with pre-computed normality test statistics:
# Pre-aggregated normality test results
df_summary = pl.DataFrame({
"volume": [100, 150, 80],
"statistic": [0.95, 0.88, 0.92],
"p_value": [0.08, 0.02, 0.05],
"data_source": ["training", "validation", "test"]
})
# Aggregate normality results across data sources
result = shapiro_wilk(
name="combined_normality",
dataset=df_summary,
data_format="summary_level",
volume="volume",
statistic="statistic",
p_value="p_value",
segment=["data_source"]
).run_metric().collect()
Data Format Requirements¶
Record-Level Data¶
For testing normality of individual observations:
- data_column: Column containing numeric values to test for normality
- Optional: segment columns for group-wise testing
- Minimum: 3 observations per group (Shapiro-Wilk requirement)
- Maximum: 5000 observations per group (scipy limitation)
Summary-Level Data¶
For aggregating pre-computed test statistics:
- volume: Number of observations used in the original test
- statistic: The W statistic from Shapiro-Wilk test (0.0-1.0)
- p_value: The p-value from the normality test (0.0-1.0)
- Optional: segment columns for group identification
Output Columns¶
All normality functions return:
- group_key: Segmentation group identifier (struct of segment values)
- volume: Number of observations tested
- statistic: Test statistic value (higher values indicate more normal-like data)
- p_value: Statistical significance of the test
Interpretation Guidelines¶
Shapiro-Wilk Test Results¶
- Null Hypothesis (H0): The data follows a normal distribution
- Alternative Hypothesis (H1): The data does not follow a normal distribution
Decision Rules¶
Choose your significance level (alpha) based on your requirements:
- p_value < alpha: Reject H0 - Evidence against normality
- p_value >= alpha: Fail to reject H0 - Insufficient evidence against normality
Common Alpha Values¶
- 0.05 (5%): Standard significance level for most applications
- 0.01 (1%): More stringent threshold for critical applications
- 0.10 (10%): More lenient threshold for exploratory analysis
Practical Guidelines¶
| p-value Range | Interpretation | Action |
|---|---|---|
| p ≥ 0.10 | Strong evidence for normality | Safe to assume normality |
| 0.05 ≤ p < 0.10 | Weak evidence against normality | Consider normality assumption carefully |
| 0.01 ≤ p < 0.05 | Moderate evidence against normality | Likely not normal, consider alternatives |
| p < 0.01 | Strong evidence against normality | Data is not normal, use non-parametric methods |
Limitations and Considerations¶
Sample Size Constraints¶
- Minimum: 3 observations required for Shapiro-Wilk test
- Maximum: 5000 observations (scipy implementation limit)
- Optimal: 20-500 observations for best test power
Sensitivity Factors¶
- Outliers: Shapiro-Wilk is sensitive to extreme values
- Ties: Repeated values can affect test performance
- Sample Size: Very large samples may reject normality for trivial deviations
When to Use Normality Tests¶
Recommended Use Cases¶
- Pre-analysis Validation: Before applying parametric statistical tests
- Model Residual Analysis: Testing assumptions for regression models
- Quality Control: Monitoring data distribution consistency
- Method Selection: Choosing between parametric and non-parametric approaches
Alternative Approaches¶
If normality is rejected, consider:
- Visual Methods: Q-Q plots, histograms, density plots
- Robust Statistics: Methods that don't assume normality
- Data Transformation: Log, square root, or Box-Cox transformations
- Non-parametric Tests: Wilcoxon, Mann-Whitney, Kruskal-Wallis
Best Practices¶
Data Preparation¶
- Remove Outliers: Consider outlier treatment before testing
- Sufficient Sample Size: Ensure adequate observations for reliable results
- Segmentation Strategy: Test normality within homogeneous groups
- Missing Data: Handle missing values appropriately
Result Interpretation¶
- Multiple Comparisons: Adjust significance levels when testing multiple groups
- Practical Significance: Consider effect size, not just statistical significance
- Domain Context: Apply domain knowledge to interpretation
- Complementary Analysis: Use with visual diagnostic tools
Common Applications in Credit Risk¶
# Example: Testing normality of model residuals
residuals_df = pl.DataFrame({
"residuals": model_residuals,
"time_period": time_periods,
"product_type": product_types
})
# Test residual normality by product and time
normality_test = shapiro_wilk(
name="residual_normality_test",
dataset=residuals_df,
data_format="record_level",
data_column="residuals",
segment=["product_type", "time_period"]
)
results = normality_test.run_metric().collect()
# Check which segments fail normality assumption
non_normal_segments = results.filter(pl.col("p_value") < 0.05)
This documentation provides comprehensive guidance for using normality testing metrics to validate statistical assumptions and ensure appropriate method selection in your analysis workflows.