Root Mean Squared Error (RMSE) Metric¶
The rmse metric calculates the Root Mean Squared Error, measuring the accuracy of predicted values against observed values. RMSE provides a measure of how well the model's predictions match the observed data.
Metric Type: rmse
RMSE Calculation¶
The RMSE is calculated as: sqrt(sum((observed - predicted)^2) / n)
Where:
- observed = Observed values
- predicted = Predicted values
- n = Number of observations
RMSE is expressed in the same units as the data being measured and is always non-negative. A value of 0 indicates perfect prediction accuracy.
Configuration Fields¶
Record-Level Data Format¶
For individual observation records:
collections:
model_rmse:
metrics:
- name:
- prediction_accuracy
data_format: record
observed: observed_values
predicted: predicted_values
segment:
- - model_version
metric_type: rmse
dataset: predictions
Summary-Level Data Format¶
For pre-aggregated error data:
collections:
summary_rmse:
metrics:
- name:
- aggregated_rmse
data_format: summary
volume: observation_count
sum_squared_errors: sse
segment:
- - data_source
metric_type: rmse
dataset: error_summary
Required Fields by Format¶
Record-Level Required¶
name: Metric name(s)data_format: Must be "record"observed: Observed values column namepredicted: Predicted values column namedataset: Dataset reference
Summary-Level Required¶
name: Metric name(s)data_format: Must be "summary"volume: Volume count column namesum_squared_errors: Sum of squared errors column namedataset: Dataset reference
Optional Fields¶
segment: List of column names for grouping
Output Columns¶
The metric produces the following output columns:
group_key: Segmentation group identifier (struct of segment values)volume: Total number of observationsrmse: Root Mean Squared Error value
Fan-out Examples¶
Single Configuration¶
collections:
basic_rmse:
metrics:
- name:
- model_rmse
data_format: record
observed: actual_values
predicted: predicted_values
metric_type: rmse
dataset: validation_data
Segmented Analysis¶
collections:
segmented_rmse:
metrics:
- name:
- regional_rmse
- product_rmse
data_format: record
observed: observed_values
predicted: predicted_values
segment:
- - region
- - product_type
metric_type: rmse
dataset: performance_data
Mixed Data Formats¶
collections:
detailed_rmse:
metrics:
- name:
- record_rmse
data_format: record
observed: actual
predicted: predicted
metric_type: rmse
dataset: detailed_data
summary_rmse:
metrics:
- name:
- summary_rmse
data_format: summary
volume: count
sum_squared_errors: sse
metric_type: rmse
dataset: summary_data
Data Requirements¶
Record-Level Data¶
- One row per observation
- Observed column: numeric values (any numeric value is allowed)
- Predicted column: numeric values (any numeric value is allowed)
- Both columns must have the same units/scale
Summary-Level Data¶
- One row per group/segment
- Volume counts: positive numbers
- Sum of squared errors: positive numbers
RMSE Interpretation¶
Value Guidelines¶
- 0.0: Perfect prediction accuracy
- Low values: Good prediction accuracy (relative to data scale)
- High values: Poor prediction accuracy (relative to data scale)
Scale Considerations¶
- RMSE is in the same units as the observed data
- Compare RMSE values only for data with similar scales
- Use relative measures (RMSE/mean) for cross-scale comparisons
Important Notes¶
- Scale Sensitivity: RMSE is sensitive to the scale of the data - larger values will naturally have larger RMSE
- Outlier Sensitivity: RMSE is sensitive to outliers due to the squaring operation
- Units: RMSE results are in the same units as the input data
- Non-negative: RMSE values are always non-negative
- Data Quality: Remove missing values and ensure data types are numeric before calculation