← Back to context

Comment by HarHarVeryFunny

3 days ago

No doubt LLMs and tooling will continue to improve, and best use cases for them better understood, but what Ilya seems to be referring to is the massive disconnect between the headline-grabbing benchmarks such as "AI performs at PhD level on math", etc, and the real-world stupidity of these models such as his example of a coding agent toggling between generating bug #1 vs bug #2, which in fact largely explains why the current economic and visible impact is much less than if the "AI is PhD level" benchmark narrative was actually true.

Calling LLMs "AI" makes them sound much more futuristic and capable than they actually are, and being such a meaningless term invites extrapolation to equally meaningless terms like AGI and visions of human-level capability.

Let's call LLMs what they are - language models - tools for language-based task automation.

Of course we eventually will do this. Fuzzy meaningless names like AI/AGI will always be reserved for the cutting edge technology du jour, and older tech that is realized in hindsight to be much more limited will revert to being called by more specific names such as "expert system", "language model", etc.

There is actually an interesting scenario in this disconnect that we are experiencing. Maybe "real" AGI in the sense of intelligence that self-corrects effectively like a human is still a long way. Maybe we will be stuck with this kind of ever-improving but still kind of deficient LLM intelligence we have right now.

There are tons of use cases even for such a limited type of intelligence. No, it is not a million math PhDs at your disposal. It is a narrow intelligence that is still hugely useful and businesses will need a few years to adapt. The impact on topics like customer service with LLM+RAG+triggering actions is very close already and should transform the industry in the next years.

  • Yes - LLMs are useful, even if auto-regressively trained GPTs aren't the answer to human intelligence, and outside of software development (maybe there too) it seems we're still very early in companies trying to figure out what they can and can not usefully be used for.

    It seems the LLM companies generating all the hype (mostly OpenAI & Anthropic) may be shooting themselves in the foot a bit here, raising false expectations of what LLMs can do, or soon will be able to do, and therefore encouraging all the misapplication and failed corporate projects that are currently happening. Anthropic are talkiing out of both sides of their mouth here, saying that AGI is imminent, about to replace developers and remote workers, yet acknowledging that the technology and use case selection is so fickle that corporations aren't likely to be successful without 1-on-1 guidance from Anthropic.

    The mythical AGI, an artificial human, will presumably be transformative if/when it ever arrives, but even if we're still early days in LLM adoption it's not clear if that (LLMs) really will be. Developers get a new tool to use, consumers get a new frustrating AI customer service to deal with, corporate e-mails, marketing literature and powerpoints become enshittified LLM-generated AI slop, etc. Maybe the biggest "transformative" (widely felt) impact of LLMs is potentially chatbots and AI-search, but it seems people are just taking that in their stride, and not obvious that the experience and impact from that is going to change much going forwards.

> the real-world stupidity of these models such as his example of a coding agent toggling between generating bug #1 vs bug #2, which in fact largely explains why the current economic and visible impact is much less than if the "AI is PhD level" benchmark narrative was actually true.

this could be true in the past, but in recent weeks I started more and more trust top AI models and less PhDs I work with. Quality jump is very real imo.

  • Are you a mathematician? I’m not an expert on the math field but it seems like they are hitting the same issues everyone else has: current LLMs still more or less need to be supervised by an expert and struggle to do something actually novel or build out a complicated proof correctly.

    • There's a limit to how much novelty you're going to get from an LLM, especially in areas like programming and math where they've been heavily RL'd NOT to be novel, even to extent that the base model supports, and instead generate much narrower more proscribed outputs.

      The limit to the novelty you are going to get from an LLM is essentially the "deductive/generative closure" of the training data. To be truly novel and move past the limits of your own past experience requires things like curiosity, continual learning, and the autonomy/agency to explore and learn.

      1 reply →

    • I work in math heavy applied setting. Randomly hired PhDs are also need to be supervised, end results being monitored, code be reviewed or they will make lots of mistakes, and my view is if you throw some problem like: build optimization model for this kind of problem on this kind of data, LLMs may produce better results.