Comment by sigmoid10

5 days ago

While I usually enjoy seeing these discussions, I think they are really pushing the usefulness of bayesian statistics. If one dude says the chance for an outcome is 8% and another says it's 16% and the outcome does occur, they were both pretty wrong, even though it might seem like the one who guessed a few % higher might have had a better belief system. Now if one of them had said 90% while the other said 8% or 16%, then we should pay close attention to what they are saying.

15 comments

sigmoid10

AlphaAndOmega0 4 days ago

The person who guessed 16% would have a lower Brier score (lower is better) and someone who estimated 100%, beyond being correct, would have the lowest possible value.

sigmoid10 4 days ago

I'm not saying there aren't ways to measure this (bayesian statistics does exist after all), I'm saying the difference is not worth arguing about who was right. Or even who had a better guess.

zeroonetwothree 5 days ago

A 16% or even 8% event happening is quite common so really it tells us nothing and doesn’t mean either one was pretty wrong.

grillitoazul 5 days ago

From a mathematical point of view there are two factors: (1) Initial prior capability of prediction from the human agents and (2) Acceleration in the predicted event. Now we examine the result under such a model and conclude that:

The more prior predictive power of human agents imply the more a posterior acceleration of progress in LLMs (math capability).

Here we are supposing that the increase in training data is not the main explanatory factor.

This example is the gem of a general framework for assessing acceleration in LLM progress, and I think its application to many data points could give us valuable information.

grillitoazul 5 days ago

Another take at a sound interpretation:
(1) Bad prior prediction capability of humans imply that result does not provide any information
(2) Good prior prediction capability of humans imply that there is acceleration in math capabilities of LLMs.

tunesmith 4 days ago

The whole point is to make many such predictions and experience many outcomes. The goal is for your 70% predictions to be correct 70% of the time. We all have a gap between how confident we are and how often we're correct. Calibration, which can be measured by making many predictions, is about reducing that gap.

fxwin 4 days ago

If i predict that my next dice roll will be a 5 with 16% certainty and i do indeed roll a 5, was my prediction wrong?

davidclark 4 days ago

The correctness of 8%, 16%, and 90% are all equally unknown since we only have one timeline, no?

navane 4 days ago
That's why you have to let these people make predictions about many things. Than you can weigh the 8, 16, and 90 pct and see who is talking out of their ass.
- sigmoid10 4 days ago
  
  That's just the frequentist approach. But we're talking about bayesian statistics here.
  
  2 replies →
lblume 4 days ago

If one is calibrated to report proper percentages and assigns 8% to 25 distinct events, you should expect 2 of the events to occur; 4 in case of 16% and 22.5 in case of 90%. Assuming independence (as is sadly too often done) standard math of binomial distributions can be applied and used to distinguish the prediction's accuracy probabilistically despite no actual branching or experimental repetition taking place.
cellis 4 days ago
This is probably the best thing I’ve ever read about predictions of the future. If we could run 80 parallel universes then sure it would make sense. But we only have the one [1]. If you’re right and we get fast takeoff it won’t matter because we’re all dead. In any case the number is meaningless, there is only ONE future.
- hyghjiyhu 4 days ago
  
  You can make predictions of many different things though. Building a quantifiable track record. If one person is consistently confidently wrong then that says something about their ability and methodology