Comment by motorest

9 months ago

> LLMs don't understand. It's mind-boggling to me that large parts of the tech industry think that.

I think you might be tied to a definition of "understanding" that doesn't really apply.

If you prompt a LLM with ambiguous instructions, it requests you to clarify (i.e., extend prompt to provide more context) and once you do the LLM outputs something that exactly meets the goals of the initial prompt, does it count as understanding?

If it walks like a duck and quacks like a duck, it's a duck,or something so close to a duck that we'd be better off calling it that.

> If you prompt a LLM with ambiguous instructions, it requests you to clarify (i.e., extend prompt to provide more context)

It does not understand that it needs clarification. This behavior is replicated pattern

  • So you have two prompts, one is ambiguous and the second is the same prompt but with the ambiguity resolved.

    In the first prompt the replicated pattern is to ask for clarification, in the second prompt the replicated pattern is to perform the work. The machine might understand nothing but does it matter when it responds appropriately to the different cases?

    I don't really care whether it understands anything at all, I care that the machine behaves as though it did have understanding.

    • > So you have two prompts, one is ambiguous and the second is the same prompt but with the ambiguity resolved.

      No. You have an initial prompt that is vague, and then you have another prompt that is more specific.

      - "draw me an automobile"

      - "here's a picture of an ambulance."

      - "could you make it a convertible instead? Perhaps green."

      - "ok, here's a picture of a jaguar e-type".

  • What is the difference? What would actual understanding look like?

    • Your Question is an example of the difference.

      Your question can be rephrased to “what would an actual difference look like.”

      However, what you are asking underneath that, is a mix of “what is the difference” and “what is the PRACTICAL difference in terms of output”

      Or in other words, if the output looks like what someone with understanding would say, how is it meaningfully different.

      —-

      Humans have a complex model of the world underlying their thinking. When I am explaining this to you, you are (hopefully) not just learning how to imitate my words. You are figuring out how to actually build a model of an LLM, that creates intuitions / predictions of its behavior.

      In analogy terms, learning from this conversation, (understanding) is to create a bunch of LEGO blocks in your head, which you can then reuse and rebuild according to the rules of LEGO.

      One of the intuitions is that humans can hallucinate, because they can have a version of reality in their head which they know is accurate and predicts physical reality, but they can be sick/ill and end up translating their sensory input as indicating a reality that doesn’t exist. OR they can lie.

      Hallucinations are a good transition point to move back to LLMs, because LLMs cannot actually hallucinate, or lie. They are always “perceiving” their mathematical reality, and always faithfully producing outputs.

      If we are to anthropomorphize it back to our starting point about “LLMs understand”, this means that even when LLMs “hallucinate” or “lie”, they are actually being faithful and honest, because they are not representing an alternate reality. They are actually precisely returning the values based on the previous values input into the system.

      “LLMs understand” is misleading, and trojans in a concept of truth (therefore untruth) and other intuitions that are invalid.

      —-

      However, understanding this does not necessarily change how you use the LLMs 90% of the time, it just changes how you model them in your head, resulting in a higher match between observer reality and your predictive reality.

      For an LLM this makes not difference, because its forecasting the next words the same way.

      2 replies →

    • It depends on which human feedback was used to train the model. For humans, there are various communication models like the four-sides model. If the dataset has annotations for the specific facets of the communication model, then an LLM trained on this dataset will have specific probabilities that replicate that communication model. You may call this understanding what the prompter says, but it's just replication for me.

    • This isn’t a complete answer, but my short list for moving the tech many steps forward would be:

      * replying with “I don’t know” a lot more often

      * consistent responses based on the accessible corpus

      * far fewer errors (hallucinations)

      * being able to beat Pokémon reliably and in a decent time frame without any assistance or prior knowledge about the game or gaming in general (Gemini 2.5 Pro had too much help)

> If it walks like a duck and quacks like a duck, it's a duck,or something so close to a duck that we'd be better off calling it that.

Saying “LLMs match understanding well enough”, is to make the same core error if we were to say “rote learning is good enough” in a conversation about understanding a subject.

The issue is that they can pass the test(s), but they dont understand the work. This is the issue with a purely utilitarian measure of output.

[flagged]

  • > If you define a grammar for a new programming language and feed it to an LLM and give it NO EXAMPLES can it write code in your language?

    Yes. If you give models that have a cutoff of 2024 the documentation for a programming language written in 2025 it is able to write code in that language.

    • I have found this not to work particularly well in practice. Maybe I’m holding it wrong? Do you have any examples of this?

  • In my experience it generally has a very good understanding and does generate the relevant test cases. Then again I don't give it a grammar, I just let it generalize from examples. In my defense I've tried out some very unconventional languages.

    Grammars are an attempt at describing a language. A broken attempt if you ask me. Humans also don't like them.

    • For natural language you are right. The language came first, the grammar was retrofitted to try to find structure.

      For formal languages, which programming languages (and related ones like query languages, markup languages, etc) are an instance of, the grammar defines the language. It come first, examples second.

      Historically, computers were very good at formal languages. With LLMs we are entering a new age where machines are becoming terrible at something they once excelled at.

      Have you lately tried asking Google whether it's 2025? The very first data keeping machines (clocks) were also pretty unreliable at that. Full circle I guess.

  • > NO.

    YES! Sometimes. You’ll often hear the term “zero-shot generation”, meaning creating something new given zero examples, this is something many modern models are capable of.

  • > If you define a grammar for a new programming language and feed it to an LLM and give it NO EXAMPLES can it write code in your language?

    Neither does your average human. What's your point?

  • > If you define a grammar for a new programming language and feed it to an LLM and give it NO EXAMPLES can it write code in your language?

    Of course it can. It will experiment and learn just like humans do.

    Hacker news people still think LLMs are just some statistical model guessing things.

    • > Hacker news people still think LLMs are just some statistical model guessing things.

      That's exactly what they are. It's the definition of what they are. If you are talking about something that is doing something else, then it's not an LLM.

      3 replies →