← Back to context

Comment by roxolotl

2 days ago

This does a great job illustrating the challenges with arguing over these results. Those in the agi camp will argue that the alterations are mostly what makes the ai so powerful.

Multiple days worth of processing, cross communication, picking only the best result? That’s just the power of parallel processing and how they reason so well. Altering to a more standard prompt? Communicating with a more strict natural language helps reduce confusion. Calculator access and the vast knowledge of humanity built in? That’s the whole point.

I tend to side with Tao on this one but the point is less who’s right and more why there’s so much arguing past each other. The basic fundamentals of how to judge these tools aren’t agreed upon.

> Calculator access and the vast knowledge of humanity built in? That’s the whole point.

I think Tao's point was that a more appropriate comparison between AI and humans would be to compare it with humans that have calculator/internet access.

I agree with your overall point though: it's not straighforward to specify exactly what would be an appropriate comparison

Would be nice if we actually knew what was done so we could discuss how to judge it.

That recent announcement might just be fluff or might be some real news, depending. We just don't know.

I can't even read into their silence - this is exactly how much OpenAI would share in the totally grifting scenario and in the massive breakthrough scenario.

  • Well, they deliberately ignored the requests of IMO organizers to not publish AI results for some time (a week?) to not steal the spotlight from the actual participants, so clearly this announcement's purpose is creating hype. Makes me lean more towards the "totally grifting" scenario.

    • Amazing. Stealing the spotlight from High School students is really quite something.

      I'm glad that Tao has caught on. As an academic it is easy to assume integrity from others but there is no such thing in software big business.

      1 reply →

    • The source of this claim is a tweet.[1] The tweet screencaps a mathematician who says they talked to an IMO board member who told them "it was the general sense of the Jury and Coordinators that it's rude and inappropriate for AI developers to make announcements about their IMO performances too close to the IMO." This has now morphed into "OpenAI deliberately ignored the requests of IMO organizers to not publish AI results for some time."

      [1] https://x.com/Mihonarium/status/1946880931723194389

      3 replies →

> Those in the agi camp will argue that the alterations are mostly what makes the ai so powerful.

And here is a group of people who is painfully unaware of history.

Expert systems were amazing. They did what they were supposed to do, and well. And you could probably build better ones today on top of the current tech stack.

Why hasn't any one done that? Because constantly having to pay experts to come in and assess, update, test, and measure your system was a burden for the result returned.

Sound familiar?

LLM's are completely incapable of synthesis. They are incapable of the complex chaining, the type that one has to do when working with systems that aren't well documented. Dont believe me: Ask an LLM to help you with build root on a newly minted embedded system.

Go feed an LLM one of the puzzles from here: https://daydreampuzzles.com/logic-grid-puzzles/ -- If you want to make it more fun, change the names to those of killers and dictators and the acts to those of ones its been "told" to dissuade.

Could we re-tool an LLM to solve these sorts of matrix style problems. Sure. Is that going to generalize to the same sorts of logic and reason matrixes that a complex state machine requires? Not without a major breakthrough of a nature that is very different to the current work.

  •    > you could probably build better ones today on top of the current tech stack.
    

    In a way, this is being done. If you look around a little you'll see a bunch of jobs that pay like $50+/hr for anyone with a hard science degree to answer questions. This is one of the ways they're collecting data and trying to create new data.

    If we're saying expert systems are exclusively decision trees, then yeah, I think it would be a difficult argument to make[0]. But if you're using the general concept of a system that has a strong knowledge base but superficial knowledge, well current LLMs have very similar problems to expert systems[1].

    I'm afraid that people read this as "LLMs suck" or "LLMs are useless" but I don't think that at all. Expert systems are pretty useful, as you mention. You get better use out of your tools when you understand what they can and can't do. What they are better at and worse at, even when they can do things. LLMs are great, but oversold.

      > Go feed an LLM one of the puzzles from here
    

    These are also good. But mind you, both are online and have been for awhile. All these problems should be assumed to be within the training data.

      https://www.oebp.org/welcome.php
    

    [0] We'd need more interpretibility of these systems and then you'd have to resolve the question of if superpositioning is allowed in decision trees. But I don't think LLMs are just fancy decision trees

    [1] https://en.wikipedia.org/wiki/Expert_system#Disadvantages

    • generally, these class of constraint satisfaction problems fall under the "zebra puzzle" (or einstein puzzle) umbrella [1]. They are interesting because they posit a world with some axioms, and inference procedures, and ask if a certain statement result from them. LLMs as-is (without provers or tool usage) would have a difficult time with these constraint-satisfaction puzzles. 3-sat is a corner-case of these puzzles, and if LLMs could solve them in P time, then we have found a constructive proof of P=NP lol !

      [1] https://en.wikipedia.org/wiki/Zebra_Puzzle

    • > In a way, this is being done. If you look around a little you'll see a bunch of jobs that pay like $50+/hr for anyone with a hard science degree to answer questions. This is one of the ways they're collecting data and trying to create new data.

      This is what expert systems did, and why they fell apart. The cost of doing this, ongoing, forever never justified the results. It likely still would not even at minimum wage, and maybe more so because LLM's require so much more data.

      > All these problems should be assumed to be within the training data.

      And yet most models are going to fall flat on their face with these. "In the data" isnt enough for it to make the leaps to a solution.

      The reality is that "language" is just a representation of knowledge. The idea that we're going to gather enough examples and jump to intelligence is a mighty large assumption. I dont see an underlying commutative property at work in any of the LLM's we have today. The sooner we get to an understand that there is no (a)I coming, the sooner we can get down to building out LLM's to their full (if limited) potential.

      3 replies →

  • Fads seem all the more shallow, the longer you're working in a given field.

    > LLM's are completely incapable of synthesis.

    That I don't quite understand. LLMs are perfectly capable of interpolation.