Skip to content

Median Summary

The Median Summary metric calculates median values for specified variables, providing robust central tendency measures with optional segmentation.

Configuration Fields

Required Fields

  • metric_type: Must be "median"
  • data_format: Must be "record"
  • name (string or list): Metric identifier(s)
  • variable (string or list): variable column name(s) for median calculation
  • dataset (string): Dataset identifier (from the collection default or metric override)

Optional Fields

  • segment (string or list): Segment identifier(s) for grouping analysis

Output Columns

The Median Summary produces these output columns:

  • Standard identification columns (name, dataset, segment)
  • median: The calculated median value (50th percentile)

Data Requirements

  • Data must contain the specified variable_column(s) with numeric values
  • Missing/null values are excluded from calculation
  • If segments are specified, data must contain the segment column(s)

Fan-out Examples

Basic Configuration

collections:
  example:
    metrics:
    - name: median_loan_amount
      variable: loan_amount
      metric_type: median
      data_format: record
    dataset: loan_data

Multiple Variables

collections:
  example:
    metrics:
    - name:
      - med_income
      - med_score
      - med_debt
      variable:
      - annual_income
      - credit_score
      - total_debt
      metric_type: median
      data_format: record
    dataset: customer_data

This expands to:

  • med_income calculating median of annual_income
  • med_score calculating median of credit_score
  • med_debt calculating median of total_debt

Segmented Analysis

collections:
  example:
    dataset: customer_data
    metrics:
      - metric_type: median
        data_format: record
        name: regional_income_median
        variable: annual_income
        segment: ["region"]

This creates separate median calculations for each region.

Multiple Datasets and Variables

collections:
  quarterly_sales:
    metrics:
      - metric_type: median
        data_format: record
        name: q1_sales_median
        variable: sales_amount
        dataset: q1_data
      - metric_type: median
        data_format: record
        name: q2_sales_median
        variable: sales_amount
        dataset: q2_data

Complex Multi-dimensional Fan-out

collections:
  example:
    metrics:
    - name:
      - income_young
      - income_old
      - score_young
      - score_old
      variable:
      - annual_income
      - annual_income
      - credit_score
      - credit_score
      segment:
      - age_18_35
      - age_36_65
      - age_18_35
      - age_36_65
      metric_type: median
      data_format: record
    dataset: customer_data

Usage Notes

  • Robust Statistic: Median is less sensitive to outliers than mean
  • Numeric Data: Variable must contain numeric data types
  • Missing Values: Automatically excluded from median calculation
  • Odd vs Even: Median of even-length datasets is average of two middle values

Fan-out Expansion Rules

When using lists in configuration:

  • name and segment must have matching lengths when specified as lists
  • segment can be a single value (applied to all) or list matching other field lengths
  • Each combination creates a separate metric calculation
  • All metrics of this type will have the same output column structure

Statistical Notes

  • Median: Middle value when data is sorted (50th percentile)
  • Count: Number of non-null observations used in calculation
  • Outlier Resistance: Median provides stable central tendency even with extreme values

Use Cases

  • Skewed Distributions: Better than mean for highly skewed data
  • Income Analysis: Common for salary/income reporting due to high earners
  • Performance Metrics: Response times, processing durations
  • Risk Assessment: Central tendency for loss amounts or exposure values
  • Quality Control: Median defect rates or error frequencies