Median Summary¶

The Median Summary metric calculates median values for specified variables, providing robust central tendency measures with optional segmentation.

Configuration Fields¶

Required Fields¶

name (string or list): Metric identifier(s)
dataset (string or list): Dataset identifier(s) to analyze
type: Must be "median"
variable (string or list): variable_column name(s) for median calculation

Optional Fields¶

segment (string or list): Segment identifier(s) for grouping analysis

Output Columns¶

The Median Summary produces these output columns:

Standard identification columns (name, dataset, segment)
median: The calculated median value (50th percentile)

Data Requirements¶

Data must contain the specified variable_column(s) with numeric values
Missing/null values are excluded from calculation
If segments are specified, data must contain the segment column(s)

Fan-out Examples¶

Basic Configuration¶

- name: median_loan_amount
  type: median
  dataset: loan_data
  variable: loan_amount

Multiple Variables¶

- name: [med_income, med_score, med_debt]
  type: median
  dataset: customer_data
  variable: [annual_income, credit_score, total_debt]

This expands to:

med_income calculating median of annual_income
med_score calculating median of credit_score
med_debt calculating median of total_debt

Segmented Analysis¶

- name: regional_income_median
  type: median
  dataset: customer_data
  variable: annual_income
  segment: [north, south, east, west]

This creates separate median calculations for each region.

Multiple Datasets and Variables¶

- name: [q1_sales_median, q2_sales_median]
  type: median
  dataset: [q1_data, q2_data]
  variable: [sales_amount, sales_amount]

Complex Multi-dimensional Fan-out¶

- name: [income_young, income_old, score_young, score_old]
  type: median
  dataset: customer_data
  variable: [annual_income, annual_income, credit_score, credit_score]
  segment: [age_18_35, age_36_65, age_18_35, age_36_65]

Usage Notes¶

Robust Statistic: Median is less sensitive to outliers than mean
Numeric Data: Variable must contain numeric data types
Missing Values: Automatically excluded from median calculation
Odd vs Even: Median of even-length datasets is average of two middle values

Fan-out Expansion Rules¶

When using lists in configuration:

name, dataset, variable must have matching lengths when specified as lists
segment can be a single value (applied to all) or list matching other field lengths
Each combination creates a separate metric calculation
All metrics of this type will have the same output column structure

Statistical Notes¶

Median: Middle value when data is sorted (50th percentile)
Count: Number of non-null observations used in calculation
Outlier Resistance: Median provides stable central tendency even with extreme values

Use Cases¶

Skewed Distributions: Better than mean for highly skewed data
Income Analysis: Common for salary/income reporting due to high earners
Performance Metrics: Response times, processing durations
Risk Assessment: Central tendency for loss amounts or exposure values
Quality Control: Median defect rates or error frequencies