Mean Summary¶
The Mean Summary metric calculates arithmetic mean values for specified variables, with optional segmentation.
Configuration Fields¶
Required Fields¶
metric_type: Must be"mean"data_format: Must be"record"name(string or list): Metric identifier(s)variable(string or list): variable column name(s) for mean calculationdataset(string): Dataset identifier (from the collection default or metric override)
Optional Fields¶
segment(string or list): Segment identifier(s) for grouping analysis
Output Columns¶
The Mean Summary produces these output columns:
- Standard identification columns (
name,dataset,segment) mean: The calculated arithmetic mean value
Data Requirements¶
- Data must contain the specified variable_column(s) with numeric values
- Missing/null values are excluded from calculation
- If segments are specified, data must contain the segment column(s)
Fan-out Examples¶
Basic Configuration¶
collections:
example:
metrics:
- name: average_loan_amount
variable: loan_amount
metric_type: mean
data_format: record
dataset: loan_data
Multiple Variables¶
collections:
example:
metrics:
- name:
- avg_income
- avg_score
- avg_debt
variable:
- annual_income
- credit_score
- total_debt
metric_type: mean
data_format: record
dataset: customer_data
This expands to:
avg_incomecalculating mean ofannual_incomeavg_scorecalculating mean ofcredit_scoreavg_debtcalculating mean oftotal_debt
Segmented Analysis¶
collections:
example:
dataset: customer_data
metrics:
- metric_type: mean
data_format: record
name: regional_income_avg
variable: annual_income
segment: ["region"]
This creates separate mean calculations for each region.
Multiple Datasets and Variables¶
collections:
quarterly_revenue:
metrics:
- metric_type: mean
data_format: record
name: q1_revenue
variable: revenue
dataset: q1_sales
- metric_type: mean
data_format: record
name: q2_revenue
variable: revenue
dataset: q2_sales
Complex Multi-dimensional Fan-out¶
collections:
example:
metrics:
- name:
- income_north
- income_south
- score_north
- score_south
variable:
- annual_income
- annual_income
- credit_score
- credit_score
segment:
- north
- south
- north
- south
metric_type: mean
data_format: record
dataset: customer_data
Usage Notes¶
- Numeric Data: Variable must contain numeric data types
- Missing Values: Automatically excluded from mean calculation
- Zero Values: Included in calculation unless explicitly filtered in data
- Segmentation: Each segment produces separate mean calculation
- Large Datasets: Efficient calculation even with millions of records
Fan-out Expansion Rules¶
When using lists in configuration:
nameandsegmentmust have matching lengths when specified as listssegmentcan be a single value (applied to all) or list matching other field lengths- Each combination creates a separate metric calculation
- All metrics of this type will have the same output column structure
Statistical Notes¶
- Arithmetic Mean: Simple average of all values
- Outlier Sensitivity: Mean can be affected by extreme values; consider median for robust central tendency