Skip to content

Normality Testing Metrics

This module provides statistical tests for assessing whether data follows a normal distribution, which is a key assumption for many statistical methods and model validation techniques.

Available Functions

Shapiro-Wilk Test

shapiro_wilk

shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level"],
    data_column: str,
    segment: SegmentCol = None,
) -> ShapiroWilk
shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["summary_level"],
    volume: str,
    statistic: str,
    p_value: str,
    segment: SegmentCol = None,
) -> ShapiroWilk
shapiro_wilk(
    *,
    name: str,
    dataset: LazyFrame | DataFrame,
    data_format: Literal["record_level", "summary_level"],
    segment: SegmentCol = None,
    **kwargs,
) -> ShapiroWilk

Compute the Shapiro-Wilk test for normality.

The Shapiro-Wilk test is a statistical test to assess whether a dataset follows a normal distribution. It is considered one of the most powerful normality tests, especially for small to medium sample sizes.

The test returns: - statistic: The test statistic (W), ranges from 0 to 1 - p_value: The p-value for the test - volume: The number of observations used in the test

Interpretation guidelines: - The null hypothesis (H0) assumes the data follows a normal distribution - The alternative hypothesis (H1) assumes the data does not follow a normal distribution - Compare p_value to your chosen significance level (alpha): * If p_value < alpha: Evidence against normality (reject H0) * If p_value >= alpha: Insufficient evidence against normality (fail to reject H0) - Common alpha values: 0.05 (5%), 0.01 (1%), or 0.10 (10%)

Limitations: - Requires at least 3 observations - Maximum sample size is 5000 (scipy limitation) - Sensitive to outliers and ties in the data

Parameters:

Name Type Description Default
name str

The name identifier for this metric instance.

required
dataset LazyFrame | DataFrame

The input dataset as either a LazyFrame or DataFrame.

required
data_format Literal['record_level', 'summary_level']

The format of the input data.

required
segment SegmentCol

Optional list of column names to use for segmentation/grouping.

None
**kwargs

Additional arguments based on data_format.

{}
Record-level format args

data_column: The column containing the data to test for normality.

Summary-level format args

volume: The column containing the count of observations. statistic: The column containing pre-computed Shapiro-Wilk statistics. p_value: The column containing pre-computed p-values.

Returns:

Type Description
ShapiroWilk

A ShapiroWilk metric instance ready for computation.

Examples:

Record-level usage:

>>> import polars as pl
>>> from tnp_statistic_library.metrics.normality import shapiro_wilk
>>>
>>> # Create sample data
>>> df = pl.DataFrame({
...     "values": [1.2, 1.1, 1.3, 1.0, 1.4, 1.2, 1.1, 1.5, 1.3, 1.2],
...     "group": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"]
... })
>>>
>>> # Test normality for each group
>>> metric = shapiro_wilk(
...     name="data_normality",
...     dataset=df,
...     data_format="record_level",
...     data_column="values",
...     segment=["group"]
... )
>>> result = metric.run_metric().collect()

Summary-level usage:

>>> df_summary = pl.DataFrame({
...     "volume": [50, 45],
...     "statistic": [0.95, 0.92],
...     "p_value": [0.06, 0.03],
...     "region": ["North", "South"]
... })
>>>
>>> metric = shapiro_wilk(
...     name="regional_normality",
...     dataset=df_summary,
...     data_format="summary_level",
...     volume="volume",
...     statistic="statistic",
...     p_value="p_value",
...     segment=["region"]
... )
>>> result = metric.run_metric().collect()

Usage Examples

Record-Level Data

Test normality of individual observations:

import polars as pl
from tnp_statistic_library.metrics.normality import shapiro_wilk

# Create sample data
df = pl.DataFrame({
    "residuals": [0.1, -0.2, 0.05, 0.3, -0.1, 0.2, -0.05, 0.15],
    "model_version": ["v1", "v1", "v1", "v1", "v2", "v2", "v2", "v2"]
})

# Test normality of residuals by model version
result = shapiro_wilk(
    name="residual_normality",
    dataset=df,
    data_format="record_level",
    data_column="residuals",
    segment=["model_version"]
).run_metric().collect()

print(result)

Summary-Level Data

Work with pre-computed normality test statistics:

# Pre-aggregated normality test results
df_summary = pl.DataFrame({
    "volume": [100, 150, 80],
    "statistic": [0.95, 0.88, 0.92],
    "p_value": [0.08, 0.02, 0.05],
    "data_source": ["training", "validation", "test"]
})

# Aggregate normality results across data sources
result = shapiro_wilk(
    name="combined_normality",
    dataset=df_summary,
    data_format="summary_level",
    volume="volume",
    statistic="statistic",
    p_value="p_value",
    segment=["data_source"]
).run_metric().collect()

Data Format Requirements

Record-Level Data

For testing normality of individual observations:

  • data_column: Column containing numeric values to test for normality
  • Optional: segment columns for group-wise testing
  • Minimum: 3 observations per group (Shapiro-Wilk requirement)
  • Maximum: 5000 observations per group (scipy limitation)

Summary-Level Data

For aggregating pre-computed test statistics:

  • volume: Number of observations used in the original test
  • statistic: The W statistic from Shapiro-Wilk test (0.0-1.0)
  • p_value: The p-value from the normality test (0.0-1.0)
  • Optional: segment columns for group identification

Output Columns

All normality functions return:

  • group_key: Segmentation group identifier (struct of segment values)
  • volume: Number of observations tested
  • statistic: Test statistic value (higher values indicate more normal-like data)
  • p_value: Statistical significance of the test

Interpretation Guidelines

Shapiro-Wilk Test Results

  • Null Hypothesis (H0): The data follows a normal distribution
  • Alternative Hypothesis (H1): The data does not follow a normal distribution

Decision Rules

Choose your significance level (alpha) based on your requirements:

  • p_value < alpha: Reject H0 - Evidence against normality
  • p_value >= alpha: Fail to reject H0 - Insufficient evidence against normality

Common Alpha Values

  • 0.05 (5%): Standard significance level for most applications
  • 0.01 (1%): More stringent threshold for critical applications
  • 0.10 (10%): More lenient threshold for exploratory analysis

Practical Guidelines

p-value Range Interpretation Action
p ≥ 0.10 Strong evidence for normality Safe to assume normality
0.05 ≤ p < 0.10 Weak evidence against normality Consider normality assumption carefully
0.01 ≤ p < 0.05 Moderate evidence against normality Likely not normal, consider alternatives
p < 0.01 Strong evidence against normality Data is not normal, use non-parametric methods

Limitations and Considerations

Sample Size Constraints

  • Minimum: 3 observations required for Shapiro-Wilk test
  • Maximum: 5000 observations (scipy implementation limit)
  • Optimal: 20-500 observations for best test power

Sensitivity Factors

  • Outliers: Shapiro-Wilk is sensitive to extreme values
  • Ties: Repeated values can affect test performance
  • Sample Size: Very large samples may reject normality for trivial deviations

When to Use Normality Tests

  1. Pre-analysis Validation: Before applying parametric statistical tests
  2. Model Residual Analysis: Testing assumptions for regression models
  3. Quality Control: Monitoring data distribution consistency
  4. Method Selection: Choosing between parametric and non-parametric approaches

Alternative Approaches

If normality is rejected, consider:

  • Visual Methods: Q-Q plots, histograms, density plots
  • Robust Statistics: Methods that don't assume normality
  • Data Transformation: Log, square root, or Box-Cox transformations
  • Non-parametric Tests: Wilcoxon, Mann-Whitney, Kruskal-Wallis

Best Practices

Data Preparation

  1. Remove Outliers: Consider outlier treatment before testing
  2. Sufficient Sample Size: Ensure adequate observations for reliable results
  3. Segmentation Strategy: Test normality within homogeneous groups
  4. Missing Data: Handle missing values appropriately

Result Interpretation

  1. Multiple Comparisons: Adjust significance levels when testing multiple groups
  2. Practical Significance: Consider effect size, not just statistical significance
  3. Domain Context: Apply domain knowledge to interpretation
  4. Complementary Analysis: Use with visual diagnostic tools

Common Applications in Credit Risk

# Example: Testing normality of model residuals
residuals_df = pl.DataFrame({
    "residuals": model_residuals,
    "time_period": time_periods,
    "product_type": product_types
})

# Test residual normality by product and time
normality_test = shapiro_wilk(
    name="residual_normality_test",
    dataset=residuals_df,
    data_format="record_level",
    data_column="residuals",
    segment=["product_type", "time_period"]
)

results = normality_test.run_metric().collect()

# Check which segments fail normality assumption
non_normal_segments = results.filter(pl.col("p_value") < 0.05)

This documentation provides comprehensive guidance for using normality testing metrics to validate statistical assumptions and ensure appropriate method selection in your analysis workflows.