← Back to context

Comment by sigmoid10

5 days ago

While I usually enjoy seeing these discussions, I think they are really pushing the usefulness of bayesian statistics. If one dude says the chance for an outcome is 8% and another says it's 16% and the outcome does occur, they were both pretty wrong, even though it might seem like the one who guessed a few % higher might have had a better belief system. Now if one of them had said 90% while the other said 8% or 16%, then we should pay close attention to what they are saying.

The person who guessed 16% would have a lower Brier score (lower is better) and someone who estimated 100%, beyond being correct, would have the lowest possible value.

  • I'm not saying there aren't ways to measure this (bayesian statistics does exist after all), I'm saying the difference is not worth arguing about who was right. Or even who had a better guess.

A 16% or even 8% event happening is quite common so really it tells us nothing and doesn’t mean either one was pretty wrong.

From a mathematical point of view there are two factors: (1) Initial prior capability of prediction from the human agents and (2) Acceleration in the predicted event. Now we examine the result under such a model and conclude that:

The more prior predictive power of human agents imply the more a posterior acceleration of progress in LLMs (math capability).

Here we are supposing that the increase in training data is not the main explanatory factor.

This example is the gem of a general framework for assessing acceleration in LLM progress, and I think its application to many data points could give us valuable information.

  • Another take at a sound interpretation:

    (1) Bad prior prediction capability of humans imply that result does not provide any information

    (2) Good prior prediction capability of humans imply that there is acceleration in math capabilities of LLMs.

The whole point is to make many such predictions and experience many outcomes. The goal is for your 70% predictions to be correct 70% of the time. We all have a gap between how confident we are and how often we're correct. Calibration, which can be measured by making many predictions, is about reducing that gap.

If i predict that my next dice roll will be a 5 with 16% certainty and i do indeed roll a 5, was my prediction wrong?

The correctness of 8%, 16%, and 90% are all equally unknown since we only have one timeline, no?

  • That's why you have to let these people make predictions about many things. Than you can weigh the 8, 16, and 90 pct and see who is talking out of their ass.

  • If one is calibrated to report proper percentages and assigns 8% to 25 distinct events, you should expect 2 of the events to occur; 4 in case of 16% and 22.5 in case of 90%. Assuming independence (as is sadly too often done) standard math of binomial distributions can be applied and used to distinguish the prediction's accuracy probabilistically despite no actual branching or experimental repetition taking place.

  • This is probably the best thing I’ve ever read about predictions of the future. If we could run 80 parallel universes then sure it would make sense. But we only have the one [1]. If you’re right and we get fast takeoff it won’t matter because we’re all dead. In any case the number is meaningless, there is only ONE future.

    • You can make predictions of many different things though. Building a quantifiable track record. If one person is consistently confidently wrong then that says something about their ability and methodology