Comment by paxys

2 days ago

Very convenient to put "AGI" in all these agreements because the term is fundamentally undefinable. So throw out whatever numbers you want and fight about it and backtrack later.

> fundamentally undefinable

Incredible, how an entire religion has sprung up around AGI.

The problem with AGI is not that it's undefinable, but that everyone has a different one. Kinda like consciousness in that regard.

Fortunately, OpenAI already wrote theirs down. Well, Microsoft[0] says they did, anyway. Some people claimed it was a secret only a few years ago, and since then LLMs have made it so much harder to tell the difference between leaks and hallucinated news saying this, but I can say there's at least a claim of a leak[1].

[0] https://blogs.microsoft.com/blog/2026/02/27/microsoft-and-op...

[1] It talks about it, but links to a paywalled site, so I still don't know what it is: https://techcrunch.com/2024/12/26/microsoft-and-openai-have-...

The definition used to be "passes the Turing test" .. until LLMs passed it.

  • Extremely debatable. Especially because there is no "The Turing Test" [0] only a game and a few instances were described by Turing. I recommend reading the original paper before making bold claims about it. The bar for the interrogator has certainly be raised, but considering:

    - the prevalence "How many |r|'s are in the word 'strawberry'?" esque questions that cause(d) LLMs to stumble

    - context window issues

    It would be naive to claim that there does not exist, or even that it would be difficult to construct/train, an interrogator that could reliably distinguish between an LLM and human chat instance.

    [0]: https://archive.computerhistory.org/projects/chess/related_m...

  • Sure, when the expected monetary value was 0. Then they started claiming that investing $1,000,000,000,000.00 (that's $1T) into a 4 year old startup was a good idea. Change the valuation, change the goal. Then the goal was be better than a human employees (or at least more efficient or even just improves efficiency) because without that the value of the LLM is far lower than what it is being sold as. All the research so far says that LLMs fall far short of that goal. And if this was someone else's money, fine. But this is basically everyone's retirement savings. Again, higher valuation, higher goal. Finally, when you start losing people's retirement savings, criminal penalties start getting attached to things.

  • I mean… just ask about something "naughty" and they'll fail? At the very least you'd need to use setups without safeguards to pass any Turing test…

    The Turing test could also be considered equivalent to "can humans come up with questions that break the AI?" and the answer to that is still yes I'd say.

  • It hasn't even passed the original turning test, depending on the question. There are an unlimited number of questions that cause LLMs to give inhuman looking answers.

    As for writing in general slop score is still higher than a human baseline for all models[1], so all a human tester has to do is grade it and make the human write a bunch, the interrogator is allowed to submit an arbitrarily long list of questions.

    [1] https://eqbench.com/slop-score.html