Comment by z7

5 days ago

Some previous predictions:

In 2021 Paul Christiano wrote he would update from 30% to "50% chance of hard takeoff" if we saw an IMO gold by 2025.

He thought there was an 8% chance of this happening.

Eliezer Yudkowsky said "at least 16%".

Source:

https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challe...

81 comments

sigmoid10 5 days ago

While I usually enjoy seeing these discussions, I think they are really pushing the usefulness of bayesian statistics. If one dude says the chance for an outcome is 8% and another says it's 16% and the outcome does occur, they were both pretty wrong, even though it might seem like the one who guessed a few % higher might have had a better belief system. Now if one of them had said 90% while the other said 8% or 16%, then we should pay close attention to what they are saying.

AlphaAndOmega0 4 days ago
The person who guessed 16% would have a lower Brier score (lower is better) and someone who estimated 100%, beyond being correct, would have the lowest possible value.
- sigmoid10 4 days ago
  
  I'm not saying there aren't ways to measure this (bayesian statistics does exist after all), I'm saying the difference is not worth arguing about who was right. Or even who had a better guess.
zeroonetwothree 5 days ago

A 16% or even 8% event happening is quite common so really it tells us nothing and doesn’t mean either one was pretty wrong.
grillitoazul 5 days ago
From a mathematical point of view there are two factors: (1) Initial prior capability of prediction from the human agents and (2) Acceleration in the predicted event. Now we examine the result under such a model and conclude that:
The more prior predictive power of human agents imply the more a posterior acceleration of progress in LLMs (math capability).
Here we are supposing that the increase in training data is not the main explanatory factor.
This example is the gem of a general framework for assessing acceleration in LLM progress, and I think its application to many data points could give us valuable information.
- grillitoazul 5 days ago
  
  Another take at a sound interpretation:
  (1) Bad prior prediction capability of humans imply that result does not provide any information
  (2) Good prior prediction capability of humans imply that there is acceleration in math capabilities of LLMs.
tunesmith 4 days ago

The whole point is to make many such predictions and experience many outcomes. The goal is for your 70% predictions to be correct 70% of the time. We all have a gap between how confident we are and how often we're correct. Calibration, which can be measured by making many predictions, is about reducing that gap.
fxwin 4 days ago

If i predict that my next dice roll will be a 5 with 16% certainty and i do indeed roll a 5, was my prediction wrong?
davidclark 4 days ago
The correctness of 8%, 16%, and 90% are all equally unknown since we only have one timeline, no?
- navane 4 days ago
  
  That's why you have to let these people make predictions about many things. Than you can weigh the 8, 16, and 90 pct and see who is talking out of their ass.
  
  3 replies →
- lblume 4 days ago
  
  If one is calibrated to report proper percentages and assigns 8% to 25 distinct events, you should expect 2 of the events to occur; 4 in case of 16% and 22.5 in case of 90%. Assuming independence (as is sadly too often done) standard math of binomial distributions can be applied and used to distinguish the prediction's accuracy probabilistically despite no actual branching or experimental repetition taking place.
- cellis 4 days ago
  
  This is probably the best thing I’ve ever read about predictions of the future. If we could run 80 parallel universes then sure it would make sense. But we only have the one [1]. If you’re right and we get fast takeoff it won’t matter because we’re all dead. In any case the number is meaningless, there is only ONE future.
  
  1 reply →

exegeist 5 days ago

Impressive prediction, especially pre-ChatGPT. Compare to Gary Marcus 3 months ago: https://garymarcus.substack.com/p/reports-of-llms-mastering-...

We may certainly hope Eliezer's other predictions don't prove so well-calibrated.

rafaelero 5 days ago
Gary Marcus is so systematically and overconfidently wrong that I wonder why we keep talking about this clown.
- qoez 5 days ago
  
  People just give attention to people making surprising bold counter narrative predictions but don't give them any attention when they're wrong.
- keeda 4 days ago
  
  People like him and Zitron do serve a useful purpose in balancing the hype from the other side, which, while justified to a great extent, is often a bit too overwhelming.
  
  1 reply →
causal 5 days ago

These numbers feel kind of meaningless without any work showing how he got to 16%
dcre 5 days ago
I do think Gary Marcus says a lot of wrong stuff about LLMs but I don’t see anything too egregious in that post. He’s just describing the results they got a few months ago.
- m3kw9 5 days ago
  
  He definitely cannot use the original arguments from then ChatGPT arrived, he's a perennial goal post shifter.
shuckles 5 days ago
My understanding is that Eliezer more or less thinks it's over for humans.
- 0xDEAFBEAD 5 days ago
  
  He hasn't given up though: https://xcancel.com/ESYudkowsky/status/1922710969785917691#m

andrepd 5 days ago

Context? Who are these people and what are these numbers and why shouldn't I assume they're pulled from thin air?

Voloskaya 5 days ago

> why shouldn't I assume they're pulled from thin air?
You definitely should assume they are. They are rationalists, the modus operandi is to pull stuff out of thin air and slap a single digit precision percentage prediction in front to make it seems grounded in science and well thought out.
c1ccccc1 4 days ago
You should basically assume they are pulled from thin air. (Or more precisely, from the brain and world model of the people making the prediction.)
The point of giving such estimates is mostly an exercise in getting better at understanding the world, and a way to keep yourself honest by making predictions in advance. If someone else consistently gives higher probabilities to events that ended up happening than you did, then that's an indication that there's space for you to improve your prediction ability. (The quantitative way to compare these things is to see who has lower log loss [1].)
[1] https://en.wikipedia.org/wiki/Cross-entropy
- jancsika 4 days ago
  
  > If someone else consistently gives higher probabilities to events that ended up happening than you did, then that's an indication that there's space for you to improve your prediction ability.
  Your inference seems ripe for scams.
  For example-- if I find out that a critical mass of participants aren't measuring how many participants are expected to outrank them by random chance, I can organize a simplistic service to charge losers for access to the ostensible "mentors."
  I think this happened with the stock market-- you predict how many mutual fund managers would beat the market by random chance for a given period. Then you find that same (small) number of mutual fund managers who beat the market and switched to a more lucrative career of giving speeches about how to beat the market. :)
- lucianbr 4 days ago
  
  Is there some database where you can see predictions of different people and the results? Or are we supposed to rely on them keeping track and keeping themselves honest? Because that is not something humans do generally, and I have no reason to trust any of these 'rationalists'.
  This sounds like a circular argument. You started explaining why them giving percentage predictions should make them more trustworthy, but when looking into the details, I seem to come back to 'just trust them'.
  
  1 reply →
ohdeargodno 4 days ago

>Who are these people
Clowns, mostly. Yudkowski in particular, whose only job today seems to be making awful predictions and letting lesswrong eat it up when one out of a hundred ends up coming true, solidifying his position as AI-will-destroy-the-world messiah. They make money from these outlandish takes, and more money when you keep talking about them.
It's kind of like listening to the local drunkard at the bar that once in a while ends up predicting which team is going to win in football inbetween drunken and nonsensical rants, except that for some reason posting the predictions on the internet makes him a celebrity, instead of just a drunk curiosity.
meindnoch 4 days ago

>Who are these people
Be glad you don't know anything about them. Seriously.
Maxious 5 days ago

ask chatgpt

empiricus 4 days ago

16% is just a way of saying one in six chances

Xenoamorphous 4 days ago

Or just “twice as likely as the guy who said 8%”.

Workaccount2 4 days ago

One of the most worrying trends in AI has been how wrong the experts have been with overestimating timelines.

On the other hand, I think human hubris naturally makes us dramatically overestimate how special brains are.

UltraSane 4 days ago

Those percentages are completely meaningless. No better than astrology.

Voloskaya 5 days ago

Off topic, but am I the only one getting triggered every time I see a rationalist quantify their prediction of the future with single digit accuracy? It's like their magic way of trying to get everyone to forget that they reached their conclusion in completely hand-wavy way, just like every other human being. But instead of saying "low confidence" or "high confidence" like the rest of us normies, they will tell you they think there is 16.27% chance because they really really want you to be aware that they know bayes theorem.

tedsanders 4 days ago
Interestingly, this is actually a question that's been looked at empirically!
Take a look at this paper: https://scholar.harvard.edu/files/rzeckhauser/files/value_of...
They took high-precision forecasts from a forecasting tournament and rounded them to coarser buckets (nearest 5%, nearest 10%, nearest 33%), to see if the precision was actually conveying any real information. What they found is that if you rounded the forecasts of expert forecasters, Brier scores got consistently worse, suggesting that expert forecast precision at the 5% level is still conveying useful, if noisy, information. They also found that less expert forecasters took less of a hit from rounding their forecasts, which makes sense.
It's a really interesting paper, and they recommend that foreign policy analysts try to increase precision rather than retreating to lumpy buckets like "likely" or "unlikely".
Based on this, it seems totally reasonable for a rationalist to make guesses with single digit precision, and I don't think it's really worth criticizing.
- dataflow 4 days ago
  
  Likely vs. unlikely is rounding to 50%. Single digit is rounding to 1%. I don't think the parent was suggesting the former is better than the latter. Even before I read your comment I thought that 5% precision is useful but 1% precision is a silly turn-off, unless that 1% is near the 0% or 100% boundary.
  
  13 replies →
- ghjnut 4 days ago
  
  Aim small, miss small?
c1ccccc1 4 days ago
Would you also get triggered if you saw people make a bet at, say, $24 : $87 odds? Would you shout: "No! That's too precise, you should bet $20 : $90!"? For that matter, should all prices in the stock market be multiples of $1, (since, after all, fluctuations of greater than $1 are very common)?
If the variance (uncertainty) in a number is large, correct thing to do is to just also report the variance, not to round the mean to a whole number.
Also, in log odds, the difference between 5% and 10% is about the same as the difference between 40% and 60%. So using an intermediate value like 8% is less crazy than you'd think.
People writing comments in their own little forum where they happen not to use sig-figs to communicate uncertainty is probably not a sinister attempt to convince "everyone" that their predictions are somehow scientific. For one thing, I doubt most people are dumb enough to be convinced by that, even if it were the goal. For another, the expected audience for these comments was not "everyone", it was specifically people who are likely to interpret those probabilities in a Bayesian way (i.e. as subjective probabilities).
- Voloskaya 4 days ago
  
  > Would you also get triggered if you saw people make a bet at, say, $24 : $87 odds? Would you shout: "No! That's too precise, you should bet $20 : $90!"? For that matter, should all prices in the stock market be multiples of $1, (since, after all, fluctuations of greater than $1 are very common)?
  No.
  I responded to the same point here: https://news.ycombinator.com/item?id=44618142
  > correct thing to do is to just also report the variance
  And do we also pull this one out of thin air?
  Using precise number to convey extremely unprecise and ungrounded opinions is imho wrong and to me unsettling. I'm pulling this purely out of my ass, and maybe I am making too much out of it, but I feel this is in part what is causing the many cases of very weird, and borderline associal/dangerous behaviours of some associated with the rationalists movement. When you try to precisely quantify what cannot be, and start trusting those numbers too much, you can easily be led to trust your conclusions way too much. I am 56% confident this is a real effect.
  
  4 replies →
- danlitt 4 days ago
  
  > If the variance (uncertainty) in a number is large, correct thing to do is to just also report the variance
  I really wonder what you mean by this. If I put my finger in the air and estimate the emergence of AGI as 13%, how do I get at the variance of that estimate? At face value, it is a number, not a random variable, and does not have a variance. If you instead view it as a "random sample" from the population of possible estimates I might have made, it does not seem well defined at all.
  
  1 reply →
danlitt 4 days ago
No, you are right, this hyper-numericalism is just astrology for nerds.
- OldfieldFund 4 days ago
  
  The whole community is very questionable, at best. (AI 2027, etc.)
- mewpmewp2 4 days ago
  
  In military they estimate distances this way if they don't have proper tools. Each says a min max range and then where there's most overlap, that will be taken. It's a reasonable way to make quick intuition based decisions when no other way is available.
  
  1 reply →
ben_w 4 days ago

> But instead of saying "low confidence" or "high confidence" like the rest of us normies
To add to what tedsanders wrote: there's also research that shows verbal descriptions, like those, mean wildly different things from one person to the next: https://lettersremain.com/perceptions-of-probability-and-num...
jdmoreira 4 days ago

Obviously you know nothing about a brier score.
https://en.wikipedia.org/wiki/Brier_score
also:
https://en.m.wikipedia.org/wiki/Superforecaster
mewpmewp2 4 days ago

If you take it with a grain of salt it's better than nothing. In life to express your opinion sometimes the best way is to quantify that based on intuition. To make decisions you could compile multiple experts intuitive quantities and use median or similar. There are some cases where it's more straight forward and rote, e.g. in military if you have to make distance based decisions, you might ask 8 of your soldiers to each name a number they think the distance is and take the median.
baxtr 4 days ago

No you’re definitely not the only one… 10% is ok, 5% maybe, 1% is useless.
And since we’re at it: why not give confidence intervals too?
meindnoch 4 days ago

>Off topic, but am I the only one getting triggered every time I see a rationalist
The rest of the sentence is not necessary. No, you're not the only one.
jere 4 days ago

You could look at 16% as roughly equivalent to a dice roll (1 in 6) or, you know, the odds you lose a round of Russian roulette. That's my charitable interpretation at least. Otherwise it does sound silly.
Veedrac 4 days ago
There is no honor in hiding behind euphemisms. Rationalists say ‘low confidence’ and ‘high confidence’ all the time, just not when they're making an actual bet and need to directly compare credences. And the 16.27% mockery is completely dishonest. They used less than a single significant figure.
- Voloskaya 4 days ago
  
  > just not when they're making an actual bet
  That is not my experience talking with rationalists irl at all. And that is precisely my issue, it is pervasive in every day discussion about any topic, at least with the subset of rationalists I happen to cross paths with. If it was just for comparing ability to forecast or for bets, then sure it would make total sense.
  Just the other day I had a conversation with someone about working in AI safety, it when something like "well I think there is 10 to 15% chance of AGI going wrong, and if I join I have maybe 1% chance of being able to make an impact and if.. and if... and if, so if we compare with what I'm missing by not going to <biglab> instead I have 35% confidence it's the right decision"
  What makes me uncomfortable with this, is that by using this kind of reasoning and coming out with a precise figure at the end, it cognitively bias you into being more confident in your reasoning than you should be. Because we are all used to treat numbers as the output of a deterministic, precise, scientific process.
  There is no reason to say 10% or 15% and not 8% or 20% for rogue AGI, there is no reason to think one individual can change the direction by 1% and not by 0.3% or 3%, it's all just random numbers, and so when you multiply a gut feeling number by a gut feeling number 5 times in a row, you end up with something absolutely meaningless, where the margin of error is basically 100%.
  But it somehow feels more scientific and reliable because it's a precise number, and I think this is dishonest and misleading both to the speaker themselves and to listeners. "Low confidence", or "im really not sure but I think..." have the merit of not hiding a gut feeling process behind a scientific veil.
  To be clear I'm not saying you should never use numerics to try to quantify gut feeling, it's ok to say I think there is maybe 10% chance of rogue AGI and thus I want to do this or that. What I really don't like is the stacking of multiple random predictions and trying to reason about this in good faith.
  > And the 16.27% mockery is completely dishonest.
  Obviously satire
  
  1 reply →
drexlspivey 4 days ago

Yes