-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Average of different prediction horizons as a metric? #66
Comments
IIRC that's a convention inherited by Informer and the followup works to it that have come out since this repo's initial release and before it's more recent versions. The accuracy of individual timesteps into the future can be arbitrary and hard to interpret. 1 step predictions are too easy, but distant predictions can be very difficult given a fixed length context window which may be too short. In highly periodic domains some distant horizons can also be easy (such as 24 hours ahead in a dataset with clear daily periodicity like weather forecasting). So reporting every horizon metric takes a lot of explaining, large tables, and can be misleading. Averaging gives a better sense of the model's performance over the entire duration we care about. At a few points during this project I hacked together logging metrics for accuracy at each individual timestep as a sanity-check. In my experience you can expect a roughly linearly increasing error as you predict further into the future. As far as replicating the results on these datasets in your own project, double check that you aren't counting missing datapoints in the metrics. This can make a huge difference and is something a lot of the literature (and early versions of this codebase) get wrong. |
I agree with Jake, averaging over the whole prediction horizon makes sense in order to compare single numbers as a metric. They report RMSE (I guess this is averaged over the whole horizon) It would be good to have more standardized metrics. This is not a question, just a comment, sorry for the spam! 😁 |
Yeah the traffic datasets / literature is the main example where reporting multiple horizons is the default. The longest horizons are 12 timesteps so this can be feasible. Once you get longer than that it stops making sense to report arbitrary intervals in tables in my opinion. It would be interesting if the convention for reporting forecasting results was a plot of error over forecast duration for each dataset. That wasn't necessary at the time (2021) but I think this is probably what I would do if I were to redo this project today... |
Hello Authors,
Could you please clarify the usage of the average of different prediction horizons as a benchmarking metric? Why was it used, and how to justify the validity of this?
I am doing a similar project and trying to report values at different horizons. My model is not getting values close to those reported in SOTA (top 5) models like yours. Could you please help with the intuition on reporting the average rather than individual horizons?
Thanks
Santosh
The text was updated successfully, but these errors were encountered: