Median Summary¶
The Median Summary metric calculates median values for specified variables, providing robust central tendency measures with optional segmentation.
Configuration Fields¶
Required Fields¶
name(string or list): Metric identifier(s)dataset(string or list): Dataset identifier(s) to analyzetype: Must be"median"variable(string or list): variable_column name(s) for median calculation
Optional Fields¶
segment(string or list): Segment identifier(s) for grouping analysis
Output Columns¶
The Median Summary produces these output columns:
- Standard identification columns (
name,dataset,segment) median: The calculated median value (50th percentile)
Data Requirements¶
- Data must contain the specified variable_column(s) with numeric values
- Missing/null values are excluded from calculation
- If segments are specified, data must contain the segment column(s)
Fan-out Examples¶
Basic Configuration¶
Multiple Variables¶
- name: [med_income, med_score, med_debt]
type: median
dataset: customer_data
variable: [annual_income, credit_score, total_debt]
This expands to:
med_incomecalculating median ofannual_incomemed_scorecalculating median ofcredit_scoremed_debtcalculating median oftotal_debt
Segmented Analysis¶
- name: regional_income_median
type: median
dataset: customer_data
variable: annual_income
segment: [north, south, east, west]
This creates separate median calculations for each region.
Multiple Datasets and Variables¶
- name: [q1_sales_median, q2_sales_median]
type: median
dataset: [q1_data, q2_data]
variable: [sales_amount, sales_amount]
Complex Multi-dimensional Fan-out¶
- name: [income_young, income_old, score_young, score_old]
type: median
dataset: customer_data
variable: [annual_income, annual_income, credit_score, credit_score]
segment: [age_18_35, age_36_65, age_18_35, age_36_65]
Usage Notes¶
- Robust Statistic: Median is less sensitive to outliers than mean
- Numeric Data: Variable must contain numeric data types
- Missing Values: Automatically excluded from median calculation
- Odd vs Even: Median of even-length datasets is average of two middle values
Fan-out Expansion Rules¶
When using lists in configuration:
name,dataset,variablemust have matching lengths when specified as listssegmentcan be a single value (applied to all) or list matching other field lengths- Each combination creates a separate metric calculation
- All metrics of this type will have the same output column structure
Statistical Notes¶
- Median: Middle value when data is sorted (50th percentile)
- Count: Number of non-null observations used in calculation
- Outlier Resistance: Median provides stable central tendency even with extreme values
Use Cases¶
- Skewed Distributions: Better than mean for highly skewed data
- Income Analysis: Common for salary/income reporting due to high earners
- Performance Metrics: Response times, processing durations
- Risk Assessment: Central tendency for loss amounts or exposure values
- Quality Control: Median defect rates or error frequencies