Comment by mvATM99
2 days ago
Look i'm optimistic about time-series foundation models too, but this post is hard to take seriously when the test is so flawed:
- Forward filling missing short periods of missing values. Why keep this in when you explictly mention this is not normal? Either remove it all or don't impute anything
- Claiming superiority over classic models and then not mentioning any in the results table
- Or let's not forget, the cardinal sin of using MAPE as an evaluation metric
Author here, we're trying these out for the first time for our use-cases so these are great points for us to improve upon!
Good to see positive reception to feedback! Sorry if my message came out as condescending, was not the intent. I recommend reading this piece on metrics https://openforecast.org/wp-content/uploads/2024/07/Svetunko.... It's easy to grasp, yet it contains great tips.
we're grateful for the honest feedback (and the awesome resource!), makes it easier to identify areas for improvement. Also, your point about using multiple metrics (based on use-cases, audience, etc) makes a lot of sense. Will incorporate this in our next experiment.
To clarify, you'd prefer rmsle?
Short answer: i use multiple metrics, never rely on just 1 metric.
Long answer: Is the metric for people with subject-matter knowledge? Then (Weighted)RMSSE, or the MASE alternative for a median forecast. WRMSSE is is very nice, it can deal with zeroes, is scale-invariant and symmetrical in penalizing under/over-forecasting.
The above metrics are completely uninterpretable to people outside of the forecasting sphere though. For those cases i tend to just stick with raw errors; if a percentage metric is really necessary then a Weighted MAPE/RMSE, the weighing is still graspable for most, and it doesn't explode with zeroes.
I've also been exploring FVA (Forecast Value Added), compared against a second decent forecast. FVA is very intuitive, if your base-measures are reliable at least. Aside from that i always look at forecast plots. It's tedious but they often tell you a lot that gets lost in the numbers.
RMSLE i havent used much. From what i read it looks interesting, though more for very specific scenarios (many outliers, high variance, nonlinear data?)
MAPE can be a problem also if you have a problem where rare excursions are what you want to predict and the cost of missing an event is much higher than predicting a non-event. A model that just predicts no change would have very low MAPE because most of the time nothing happens. When the event happens, however, the error of predicting status quo ante is much worse than small baseline errors.
1 reply →
Thanks for the reply! I am outside the forecasting sphere.
RMSLE gives proportional error (so, scale-invariant) without MAPE's systematic under-prediction bias. It does require all-positive values, for the logarithm step.