← Back to context

Comment by kumarvvr

2 days ago

I have an issue with the words "understanding", "reasoning", etc when talking about LLMs.

Are they really understanding, or putting out a stream of probabilities?

Does it matter from a practical point of view? It's either true understanding or it's something else that's similar enough to share the same name.

  • The polygraph is a good example.

    The "lie detector" is used to misguide people, the polygraph is used to measure autonomic arousal.

    I think these misnomers can cause real issues like thinking the LLM is "reasoning".

    • Agreed, but in the case of the lie detector, it seems it's a matter of interpretation. In the case of LLMs, what is it? Is it a matter of saying "It's a next-word calculator that uses stats, matrices and vectors to predict output" instead of "Reasoning simulation made using a neural network"? Is there a better name? I'd say it's "A static neural network that outputs a stream of words after having consumed textual input, and that can be used to simulate, with a high level of accuracy, the internal monologue of a person who would be thinking about and reasoning on the input". Whatever it is, it's not reasoning, but it's not a parrot either.

The latter. When "understand", "reason", "think", "feel", "believe", and any of a long list of similar words are in any title, it immediately makes me think the author already drank the kool aid.

  • In the context of coding agents, they do simulate “reasoning” when you feed them the output and it is able to correct itself.

  • I agree with “feel” and “believe” but what words would you suggest instead of “understand” and “reason’?

  • kool aid or not -- "reasoning" is already part of the LLM verbiage (e.g `reasoning` models having `reasoningBudget`). The meaning might not be 1:1 to human reasoning, but when the LLM shows its "reasoning" it does look _appear_ like a train of thought. If I had to give what it's doing a name (like I'm naming a function), I'd be hard pressed to not go with something like `reason`/`think`.

What does understanding mean? Is there a sensible model for it? If not, we can only judge in the same way that we judge humans: by conducting examinations and determining whether the correct conclusions were reached.

Probabilities have nothing to do with it; by any appropriate definition, there exist statistical models that exhibit "understanding" and "reasoning".

Do you yourself really understand, or are you just depolarizing neurons that have reached their threshold?

  • It can be simultaneously true that human understanding is just a firing of neurons but that the architecture and function of those neural structures is vastly different than what an LLM is doing internally such that they are not really the same. Encourage you to read Apple’s recent paper on thinking models; I think it’s pretty clear that the way LLMs encode the world is drastically inferior to what the human brain does. I also believe that could be fixed with the right technical improvements, but it just isn’t the case today.

OK, we've removed all understanding from the title above.

  • Care to provide reasoning as to why?

    • The article's title was longer than 80 chars, which is HN's limit. There's more than one way to truncate it.

      The previous truncation ("From GPT-4 to GPT-5: Measuring Progress in Medical Language Understanding") was baity in the sense that the word 'understanding' was provoking objections and taking us down a generic tangent about whether LLMs really understand anything or not. Since that wasn't about the specific work (and since generic tangents are basically always less interesting*), it was a good idea to find an alternate truncation.

      So I took out the bit that was snagging people ("understanding") and instead swapped in "MedHELM". Whatever that is, it's clearly something in the medical domain and has no sharp edge of offtopicness. Seemed fine, and it stopped the generic tangent from spreading further.

      * https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

      1 reply →