Comment by chrisjj

18 days ago

> The finance friend and the LLM made the same mistake: they evaluated the text without modelling the world it would land in.

Major error. The LLM made that text without evaluating it at all. It just parrotted words it previously saw humans use in superficially similar word contexts.

I think this debate is mis-aimed. Both sides are right about different things, and wrong in the same way.

The mistake is treating “model” as a single property, instead of separating cognition from decision.

LLMs clearly do more than surface-level word association. They encode stable relational structure: entities, roles, temporal order, causal regularities, social dynamics, counterfactuals. Language itself is a compressed record of world structure, and models trained on enough of it inevitably internalize a lot of that structure. Calling this “just a word model” undersells what’s actually happening internally.

At the same time, critics are right that these systems lack autonomous grounding. They don’t perceive, act, or test hypotheses against reality on their own. Corrections come from training data, tools, or humans. Treating their internal coherence as if it were direct access to reality is a category error.

But here’s the part both sides usually miss: the real risk isn’t representational depth, it’s authority.

There’s a difference between:

cognition: exploring possibilities, tracking constraints, simulating implications, holding multiple interpretations; and

decision: collapsing that space into a single claim about what is, what matters, or what someone thinks.

LLMs are quite good at the first. They are not inherently entitled to the second.

Most failures people worry about don’t come from models lacking structure. They come from models (or users) quietly treating cognition as decision:

coherence as truth,

explanation as diagnosis,

simulation as fact,

“this sounds right” as “this is settled.”

That’s why “world model” language is dangerous if it’s taken to imply authority. It subtly licenses conclusions the system isn’t grounded or authorized to make—about reality, about causation, or about a user’s intent or error.

A cleaner way to state the situation is:

> These systems build rich internal representations that are often world-relevant, but they do not have autonomous authority to turn those representations into claims without external grounding or explicit human commitment.

Under that framing:

The “word model” camp is right to worry about overconfidence and false grounding.

The “world model” camp is right that the internal structure is far richer than token statistics.

They’re arguing about different failure modes, but using the same overloaded word.

Once you separate cognition from decision, the debate mostly dissolves. The important question stops being “does it understand the world?” and becomes “when, and under what conditions, should its outputs be treated as authoritative?”

That’s where the real safety and reliability issues actually live.