Mean Summary¶

The Mean Summary metric calculates arithmetic mean values for specified variables, with optional segmentation.

Configuration Fields¶

Required Fields¶

metric_type: Must be "mean"
data_format: Must be "record"
name (string or list): Metric identifier(s)
variable (string or list): variable column name(s) for mean calculation
dataset (string): Dataset identifier (from the collection default or metric override)

Optional Fields¶

segment (string or list): Segment identifier(s) for grouping analysis

Output Columns¶

The Mean Summary produces these output columns:

Standard identification columns (name, dataset, segment)
mean: The calculated arithmetic mean value

Data Requirements¶

Data must contain the specified variable_column(s) with numeric values
Missing/null values are excluded from calculation
If segments are specified, data must contain the segment column(s)

Fan-out Examples¶

Basic Configuration¶

collections:
  example:
    metrics:
    - name: average_loan_amount
      variable: loan_amount
      metric_type: mean
      data_format: record
    dataset: loan_data

Multiple Variables¶

collections:
  example:
    metrics:
    - name:
      - avg_income
      - avg_score
      - avg_debt
      variable:
      - annual_income
      - credit_score
      - total_debt
      metric_type: mean
      data_format: record
    dataset: customer_data

This expands to:

avg_income calculating mean of annual_income
avg_score calculating mean of credit_score
avg_debt calculating mean of total_debt

Segmented Analysis¶

collections:
  example:
    dataset: customer_data
    metrics:
      - metric_type: mean
        data_format: record
        name: regional_income_avg
        variable: annual_income
        segment: ["region"]

This creates separate mean calculations for each region.

Multiple Datasets and Variables¶

collections:
  quarterly_revenue:
    metrics:
      - metric_type: mean
        data_format: record
        name: q1_revenue
        variable: revenue
        dataset: q1_sales
      - metric_type: mean
        data_format: record
        name: q2_revenue
        variable: revenue
        dataset: q2_sales

Complex Multi-dimensional Fan-out¶

collections:
  example:
    metrics:
    - name:
      - income_north
      - income_south
      - score_north
      - score_south
      variable:
      - annual_income
      - annual_income
      - credit_score
      - credit_score
      segment:
      - north
      - south
      - north
      - south
      metric_type: mean
      data_format: record
    dataset: customer_data

Usage Notes¶

Numeric Data: Variable must contain numeric data types
Missing Values: Automatically excluded from mean calculation
Zero Values: Included in calculation unless explicitly filtered in data
Segmentation: Each segment produces separate mean calculation
Large Datasets: Efficient calculation even with millions of records

Fan-out Expansion Rules¶

When using lists in configuration:

name and segment must have matching lengths when specified as lists
segment can be a single value (applied to all) or list matching other field lengths
Each combination creates a separate metric calculation
All metrics of this type will have the same output column structure

Statistical Notes¶

Arithmetic Mean: Simple average of all values
Outlier Sensitivity: Mean can be affected by extreme values; consider median for robust central tendency