← Back to context

Comment by hackinthebochs

17 days ago

At this point, anyone claiming that LLMs are "just" language models aren't arguing in good faith. LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Tokens can represent anything, not just words. Roughly the same architecture can generate passable images, music, or even video.

[1] https://x.com/karpathy/status/1582807367988654081

If it's an LLM it's a (large) language model. If you use ideas from LLM architecture in other non-language models, they are not language models.

But it is extremely silly to say that "large language models are language models" is a bad faith argument.

  • No, its extremely silly to use the incidental name of a thing as an argument for the limits of its relevance. LLMs were designed to model language, but that does not determine the range of their applicability, or even the class of problems they are most suited for. It turns out that LLMs are a general computing architecture. What they were originally designed for is incidental. Any argument that starts off "but they are language models" is specious out of the gate.

    • Sorry, but using "LLM" when you mean "AI" is a basic failure to understand simple definitions, and also is ignoring the meat of the blog post and much of the discussion here (which is that LLMs are limited by virtue of being only / mostly trained on language).

      Everything you are saying is either incoherent because you actually mean "AI" or "transformer", or is just plain wrong, since e.g. not all problems can be solved using e.g. single-channel, recursively-applied transformers, as I mention elsewhere here: https://news.ycombinator.com/item?id=46948612. The design of LLMs absolutely determines the range of their applicability, and the class of problems they are most suited for. This isn't even a controversial take, lots of influencers and certainly most serious researchers recognize the fundamental limitations of the LLM approach to AI.

      You literally have no idea what you are talking about and clearly do not read or understand any actual papers where these models are developed, and are just repeating simplistic metaphors from blog posts, and buying into marketing.