Skip to content

Root Mean Squared Error (RMSE) Metric

The rmse metric calculates the Root Mean Squared Error, measuring the accuracy of predicted values against observed values. RMSE provides a measure of how well the model's predictions match the observed data.

Metric Type: rmse

RMSE Calculation

The RMSE is calculated as: sqrt(sum((observed - predicted)^2) / n)

Where:

  • observed = Observed values
  • predicted = Predicted values
  • n = Number of observations

RMSE is expressed in the same units as the data being measured and is always non-negative. A value of 0 indicates perfect prediction accuracy.

Configuration Fields

Record-Level Data Format

For individual observation records:

collections:
  model_rmse:
    metrics:
    - name:
      - prediction_accuracy
      data_format: record
      observed: observed_values
      predicted: predicted_values
      segment:
      - - model_version
      metric_type: rmse
    dataset: predictions

Summary-Level Data Format

For pre-aggregated error data:

collections:
  summary_rmse:
    metrics:
    - name:
      - aggregated_rmse
      data_format: summary
      volume: observation_count
      sum_squared_errors: sse
      segment:
      - - data_source
      metric_type: rmse
    dataset: error_summary

Required Fields by Format

Record-Level Required

  • name: Metric name(s)
  • data_format: Must be "record"
  • observed: Observed values column name
  • predicted: Predicted values column name
  • dataset: Dataset reference

Summary-Level Required

  • name: Metric name(s)
  • data_format: Must be "summary"
  • volume: Volume count column name
  • sum_squared_errors: Sum of squared errors column name
  • dataset: Dataset reference

Optional Fields

  • segment: List of column names for grouping

Output Columns

The metric produces the following output columns:

  • group_key: Segmentation group identifier (struct of segment values)
  • volume: Total number of observations
  • rmse: Root Mean Squared Error value

Fan-out Examples

Single Configuration

collections:
  basic_rmse:
    metrics:
    - name:
      - model_rmse
      data_format: record
      observed: actual_values
      predicted: predicted_values
      metric_type: rmse
    dataset: validation_data

Segmented Analysis

collections:
  segmented_rmse:
    metrics:
    - name:
      - regional_rmse
      - product_rmse
      data_format: record
      observed: observed_values
      predicted: predicted_values
      segment:
      - - region
      - - product_type
      metric_type: rmse
    dataset: performance_data

Mixed Data Formats

collections:
  detailed_rmse:
    metrics:
    - name:
      - record_rmse
      data_format: record
      observed: actual
      predicted: predicted
      metric_type: rmse
    dataset: detailed_data
  summary_rmse:
    metrics:
    - name:
      - summary_rmse
      data_format: summary
      volume: count
      sum_squared_errors: sse
      metric_type: rmse
    dataset: summary_data

Data Requirements

Record-Level Data

  • One row per observation
  • Observed column: numeric values (any numeric value is allowed)
  • Predicted column: numeric values (any numeric value is allowed)
  • Both columns must have the same units/scale

Summary-Level Data

  • One row per group/segment
  • Volume counts: positive numbers
  • Sum of squared errors: positive numbers

RMSE Interpretation

Value Guidelines

  • 0.0: Perfect prediction accuracy
  • Low values: Good prediction accuracy (relative to data scale)
  • High values: Poor prediction accuracy (relative to data scale)

Scale Considerations

  • RMSE is in the same units as the observed data
  • Compare RMSE values only for data with similar scales
  • Use relative measures (RMSE/mean) for cross-scale comparisons

Important Notes

  1. Scale Sensitivity: RMSE is sensitive to the scale of the data - larger values will naturally have larger RMSE
  2. Outlier Sensitivity: RMSE is sensitive to outliers due to the squaring operation
  3. Units: RMSE results are in the same units as the input data
  4. Non-negative: RMSE values are always non-negative
  5. Data Quality: Remove missing values and ensure data types are numeric before calculation