Median Summary¶
The Median Summary metric calculates median values for specified variables, providing robust central tendency measures with optional segmentation.
Configuration Fields¶
Required Fields¶
metric_type: Must be"median"data_format: Must be"record"name(string or list): Metric identifier(s)variable(string or list): variable column name(s) for median calculationdataset(string): Dataset identifier (from the collection default or metric override)
Optional Fields¶
segment(string or list): Segment identifier(s) for grouping analysis
Output Columns¶
The Median Summary produces these output columns:
- Standard identification columns (
name,dataset,segment) median: The calculated median value (50th percentile)
Data Requirements¶
- Data must contain the specified variable_column(s) with numeric values
- Missing/null values are excluded from calculation
- If segments are specified, data must contain the segment column(s)
Fan-out Examples¶
Basic Configuration¶
collections:
example:
metrics:
- name: median_loan_amount
variable: loan_amount
metric_type: median
data_format: record
dataset: loan_data
Multiple Variables¶
collections:
example:
metrics:
- name:
- med_income
- med_score
- med_debt
variable:
- annual_income
- credit_score
- total_debt
metric_type: median
data_format: record
dataset: customer_data
This expands to:
med_incomecalculating median ofannual_incomemed_scorecalculating median ofcredit_scoremed_debtcalculating median oftotal_debt
Segmented Analysis¶
collections:
example:
dataset: customer_data
metrics:
- metric_type: median
data_format: record
name: regional_income_median
variable: annual_income
segment: ["region"]
This creates separate median calculations for each region.
Multiple Datasets and Variables¶
collections:
quarterly_sales:
metrics:
- metric_type: median
data_format: record
name: q1_sales_median
variable: sales_amount
dataset: q1_data
- metric_type: median
data_format: record
name: q2_sales_median
variable: sales_amount
dataset: q2_data
Complex Multi-dimensional Fan-out¶
collections:
example:
metrics:
- name:
- income_young
- income_old
- score_young
- score_old
variable:
- annual_income
- annual_income
- credit_score
- credit_score
segment:
- age_18_35
- age_36_65
- age_18_35
- age_36_65
metric_type: median
data_format: record
dataset: customer_data
Usage Notes¶
- Robust Statistic: Median is less sensitive to outliers than mean
- Numeric Data: Variable must contain numeric data types
- Missing Values: Automatically excluded from median calculation
- Odd vs Even: Median of even-length datasets is average of two middle values
Fan-out Expansion Rules¶
When using lists in configuration:
nameandsegmentmust have matching lengths when specified as listssegmentcan be a single value (applied to all) or list matching other field lengths- Each combination creates a separate metric calculation
- All metrics of this type will have the same output column structure
Statistical Notes¶
- Median: Middle value when data is sorted (50th percentile)
- Count: Number of non-null observations used in calculation
- Outlier Resistance: Median provides stable central tendency even with extreme values
Use Cases¶
- Skewed Distributions: Better than mean for highly skewed data
- Income Analysis: Common for salary/income reporting due to high earners
- Performance Metrics: Response times, processing durations
- Risk Assessment: Central tendency for loss amounts or exposure values
- Quality Control: Median defect rates or error frequencies