Skip to content

Root Mean Squared Error (RMSE) Metric

The rmse metric calculates the Root Mean Squared Error, measuring the accuracy of predicted values against observed values. RMSE provides a measure of how well the model's predictions match the observed data.

Metric Type: rmse

RMSE Calculation

The RMSE is calculated as: sqrt(sum((observed - predicted)^2) / n)

Where:

  • observed = Observed values
  • predicted = Predicted values
  • n = Number of observations

RMSE is expressed in the same units as the data being measured and is always non-negative. A value of 0 indicates perfect prediction accuracy.

Configuration Fields

Record-Level Data Format

For individual observation records:

metrics:
  model_rmse:
    metric_type: "rmse"
    config:
      name: ["prediction_accuracy"]
      data_format: "record_level"
      observed: "observed_values" # Column with observed/actual values
      predicted: "predicted_values" # Column with predicted values
      segment: [["model_version"]] # Optional: segmentation columns
      dataset: "predictions"

Summary-Level Data Format

For pre-aggregated error data:

metrics:
  summary_rmse:
    metric_type: "rmse"
    config:
      name: ["aggregated_rmse"]
      data_format: "summary_level"
      volume: "observation_count" # Column with observation counts
      sum_squared_errors: "sse" # Column with sum of squared errors
      segment: [["data_source"]] # Optional: segmentation columns
      dataset: "error_summary"

Required Fields by Format

Record-Level Required

  • name: Metric name(s)
  • data_format: Must be "record_level"
  • observed: Observed values column name
  • predicted: Predicted values column name
  • dataset: Dataset reference

Summary-Level Required

  • name: Metric name(s)
  • data_format: Must be "summary_level"
  • volume: Volume count column name
  • sum_squared_errors: Sum of squared errors column name
  • dataset: Dataset reference

Optional Fields

  • segment: List of column names for grouping

Output Columns

The metric produces the following output columns:

  • group_key: Segmentation group identifier (struct of segment values)
  • volume: Total number of observations
  • rmse: Root Mean Squared Error value

Fan-out Examples

Single Configuration

metrics:
  basic_rmse:
    metric_type: "rmse"
    config:
      name: ["model_rmse"]
      data_format: "record_level"
      observed: "actual_values"
      predicted: "predicted_values"
      dataset: "validation_data"

Segmented Analysis

metrics:
  segmented_rmse:
    metric_type: "rmse"
    config:
      name: ["regional_rmse", "product_rmse"]
      data_format: "record_level"
      observed: "observed_values"
      predicted: "predicted_values"
      segment: [["region"], ["product_type"]]
      dataset: "performance_data"

Mixed Data Formats

metrics:
  detailed_rmse:
    metric_type: "rmse"
    config:
      name: ["record_level_rmse"]
      data_format: "record_level"
      observed: "actual"
      predicted: "predicted"
      dataset: "detailed_data"

  summary_rmse:
    metric_type: "rmse"
    config:
      name: ["summary_rmse"]
      data_format: "summary_level"
      volume: "count"
      sum_squared_errors: "sse"
      dataset: "summary_data"

Data Requirements

Record-Level Data

  • One row per observation
  • Observed column: numeric values (any numeric value is allowed)
  • Predicted column: numeric values (any numeric value is allowed)
  • Both columns must have the same units/scale

Summary-Level Data

  • One row per group/segment
  • Volume counts: positive numbers
  • Sum of squared errors: positive numbers

RMSE Interpretation

Value Guidelines

  • 0.0: Perfect prediction accuracy
  • Low values: Good prediction accuracy (relative to data scale)
  • High values: Poor prediction accuracy (relative to data scale)

Scale Considerations

  • RMSE is in the same units as the observed data
  • Compare RMSE values only for data with similar scales
  • Use relative measures (RMSE/mean) for cross-scale comparisons

Important Notes

  1. Scale Sensitivity: RMSE is sensitive to the scale of the data - larger values will naturally have larger RMSE
  2. Outlier Sensitivity: RMSE is sensitive to outliers due to the squaring operation
  3. Units: RMSE results are in the same units as the input data
  4. Non-negative: RMSE values are always non-negative
  5. Data Quality: Remove missing values and ensure data types are numeric before calculation