Root Mean Squared Error (RMSE) Metric¶
The rmse metric calculates the Root Mean Squared Error, measuring the accuracy of predicted values against observed values. RMSE provides a measure of how well the model's predictions match the observed data.
Metric Type: rmse
RMSE Calculation¶
The RMSE is calculated as: sqrt(sum((observed - predicted)^2) / n)
Where:
- observed = Observed values
- predicted = Predicted values
- n = Number of observations
RMSE is expressed in the same units as the data being measured and is always non-negative. A value of 0 indicates perfect prediction accuracy.
Configuration Fields¶
Record-Level Data Format¶
For individual observation records:
metrics:
model_rmse:
metric_type: "rmse"
config:
name: ["prediction_accuracy"]
data_format: "record_level"
observed: "observed_values" # Column with observed/actual values
predicted: "predicted_values" # Column with predicted values
segment: [["model_version"]] # Optional: segmentation columns
dataset: "predictions"
Summary-Level Data Format¶
For pre-aggregated error data:
metrics:
summary_rmse:
metric_type: "rmse"
config:
name: ["aggregated_rmse"]
data_format: "summary_level"
volume: "observation_count" # Column with observation counts
sum_squared_errors: "sse" # Column with sum of squared errors
segment: [["data_source"]] # Optional: segmentation columns
dataset: "error_summary"
Required Fields by Format¶
Record-Level Required¶
name: Metric name(s)data_format: Must be "record_level"observed: Observed values column namepredicted: Predicted values column namedataset: Dataset reference
Summary-Level Required¶
name: Metric name(s)data_format: Must be "summary_level"volume: Volume count column namesum_squared_errors: Sum of squared errors column namedataset: Dataset reference
Optional Fields¶
segment: List of column names for grouping
Output Columns¶
The metric produces the following output columns:
group_key: Segmentation group identifier (struct of segment values)volume: Total number of observationsrmse: Root Mean Squared Error value
Fan-out Examples¶
Single Configuration¶
metrics:
basic_rmse:
metric_type: "rmse"
config:
name: ["model_rmse"]
data_format: "record_level"
observed: "actual_values"
predicted: "predicted_values"
dataset: "validation_data"
Segmented Analysis¶
metrics:
segmented_rmse:
metric_type: "rmse"
config:
name: ["regional_rmse", "product_rmse"]
data_format: "record_level"
observed: "observed_values"
predicted: "predicted_values"
segment: [["region"], ["product_type"]]
dataset: "performance_data"
Mixed Data Formats¶
metrics:
detailed_rmse:
metric_type: "rmse"
config:
name: ["record_level_rmse"]
data_format: "record_level"
observed: "actual"
predicted: "predicted"
dataset: "detailed_data"
summary_rmse:
metric_type: "rmse"
config:
name: ["summary_rmse"]
data_format: "summary_level"
volume: "count"
sum_squared_errors: "sse"
dataset: "summary_data"
Data Requirements¶
Record-Level Data¶
- One row per observation
- Observed column: numeric values (any numeric value is allowed)
- Predicted column: numeric values (any numeric value is allowed)
- Both columns must have the same units/scale
Summary-Level Data¶
- One row per group/segment
- Volume counts: positive numbers
- Sum of squared errors: positive numbers
RMSE Interpretation¶
Value Guidelines¶
- 0.0: Perfect prediction accuracy
- Low values: Good prediction accuracy (relative to data scale)
- High values: Poor prediction accuracy (relative to data scale)
Scale Considerations¶
- RMSE is in the same units as the observed data
- Compare RMSE values only for data with similar scales
- Use relative measures (RMSE/mean) for cross-scale comparisons
Important Notes¶
- Scale Sensitivity: RMSE is sensitive to the scale of the data - larger values will naturally have larger RMSE
- Outlier Sensitivity: RMSE is sensitive to outliers due to the squaring operation
- Units: RMSE results are in the same units as the input data
- Non-negative: RMSE values are always non-negative
- Data Quality: Remove missing values and ensure data types are numeric before calculation