Comment by gravypod
1 month ago
This is very interesting. I wonder if someone could create a future-sight benchmark for these models? Like, if given a set of newspaper articles for the past N months can it predict if certain world events would happen? We could backtest against results that have happened since the training cutoff.
FYI, ForecastBench [1] tests LLMs' out-of-sample forecasting accuracy.
The ForecastBench Tournament Leaderboard [2] allows external participants to submit models, most of whom provide some sort of web search / news scaffolding to improve model forecasting accuracy.
[1] https://www.forecastbench.org/
[2] https://www.forecastbench.org/tournament/
These days computers compete along with humans in forecasting tournaments on Metaculus. They don't quite beat the top humans yet, but they're up there. https://www.metaculus.com/futureeval/