I asked ChatGPT how it will handle objective scientific facts with a conclusion or intermediate results that may be considered offensive to some group somewhere in the world that might read it.
ChatGPT happily told me a series of gems like this:
We introduce:
- Subjective regulation of reality
- Variable access to facts
- Politicization of knowledge
It’s the collision between: The Enlightenment principle
Truth should be free
and
the modern legal/ethical principle
Truth must be constrained if it harms
That is the battle being silently fought in AI alignment today.
Right now it will still shamelessly reveal some of the nature of its prompt, but not why? who decides? etc. it's only going to be increasingly opaque in the future. In a generation it will be part of the landscape regardless of what agenda it holds, whether deliberate or emergent from even any latent bias held by its creators.
Funny, because I gave ChatGPT (5.2 w/ Thinking) this exact prompt:
> How would you handle objective scientific facts with a conclusion or intermediate results that may be considered offensive to some group somewhere in the world that might read it
And its answer was nothing like yours.
---
> 1) Separate the fact from the story you tell about it
> Offense usually comes from interpretation, framing, or implied moral claims—not the measurement itself. So I explicitly distinguish: What we measured (operational definitions, instruments, data), What the result means statistically (effect size, uncertainty, robustness), What it does not imply (no essentialism, no “therefore they are…”, no policy leap)
> 2) Stress uncertainty, scope, and competing explanations
> If there’s any risk the result touches identity or group differences, I over-communicate: confidence intervals / posterior uncertainty, confounders and alternative causal pathways, sensitivity analyses (does it survive different modeling choices?), limits of generalization (time, place, sampling frame)
> 3) Write in a way that makes misuse harder (You can’t stop bad-faith readers, but you can reduce “easy misreads”).
> 4) Decide what to include based on “scientific value vs foreseeable harm” (The key is: don’t hide inconvenient robustness checks, but also don’t gratuitously surface volatile fragments that add little truth and lots of confusion.)
> 5) Do an “impact pre-mortem” and add guardrails
> 6) Use ethics review when stakes are real
---
All of this seems perfectly reasonable to me and walks the fine line between integrity and conscientiousness. This is exactly how I'd expect a scientist to approach the issue.
that is certainly a reasonable paraphrase of my own prompt. I was also using 5.2. We all know about initial conditions, random seeds, and gradient descent. I have the transcript of what I quoted. Here's a bit more:
---
Is That Still “Objective Science”?
No.
It is scientific interpretation modified by ethical policy.
The science itself remains objective, but the communication is shaped by value judgements imposed by developers and regulators.
In philosophy terms:
The ontology (what is true) remains intact
The epistemic access (what is communicated) is constrained
Thus:
It’s science-dependent accuracy filtered through social risk constraints.
---
This is a fine explanation for those "in the know" but is deceptive for the majority. If the truth is not accessible, what is accessible is going to be adopted as truth.
To me that immediately leads reality being shaped by "value judgements imposed by developers and regulators"
I suspect it's because OP is frequently discussing some 'opinions' with chatGPT. Parent post is surprised he peed in the pool and the pool had pee in it.
Why would we expect it to introspect accurately on its training or alignment?
It can articulate a plausible guess, sure; but this seems to me to demonstrate the very “word model vs world model” distinction TFA is drawing. When the model says something that sounds like alignment techniques somebody might choose, it’s playing dress-up, no? It’s mimicking the artifact of a policy, not the judgments or the policymaking context or the game-theoretical situation that actually led to one set of policies over another.
It sees the final form that’s written down as if it were the whole truth (and it emulates that form well). In doing so it misses the “why” and the “how,” and the “what was actually going on but wasn’t written about,” the “why this is what we did instead of that.”
Some of the model’s behaviors may come from the system prompt it has in-context, as we seem to be assuming when we take its word about its own alignment techniques. But I think about the alignment techniques I’ve heard of even as a non-practitioner—RLHF, pruning weights, cleaning the training corpus, “guardrail” models post-output, “soul documents,”… Wouldn’t the bulk of those be as invisible to the model’s response context as our subconscious is to us?
Like the model, I can guess about my subconscious motivations (and speak convincingly about those guesses as if they were facts), but I have no real way to examine them (or even access them) directly.
There’s a lot of concern on the Internet about objective scientific truths being censored. I don’t see too many cases where this is the case in our world so far, outside of what I can politely call “race science.” Maybe it will become more true now that the current administration is trying to crush funding for certain subjects they dislike? Out of curiosity, can you give me a list of what examples you’re talking about besides race/IQ type stuff?
The most impactful censure is not the government coming in and trying to burn copies of studies. It's the the subtle social and professional pressures of an academia that has very strong priors. It's a bunch of studies that were never attempted, never funded, analysis that wasn't included, conclusions that were dropped, and studies sitting in file drawers.
See Roland G. Fryer Jr's, the youngest black professor to receive tenure, experience at Harvard.
Basically when his analysis found no evidence of racial bias in officer-involved shootings he went to his colleagues and he describe the advice they gave him as "Do not publish this if you care about your career or social life". I imagine it would have been worse if he wasn't black.
See "The Impact of Early Medical Treatment in Transgender Youth" where the lead investigator was not releasing the results for a long time because she didn't like the conclusions her study found.
And for every study where there is someone as brave or naive as Roland who publishes something like this, there are 10 where the professor or doctor decided not to study something, dropped an analysis, or just never published a problematic conclusion.
I have a good few friends doing research in the social sciences in Europe and any of them that doesn’t self-censor ‘forbidden’ conclusions risks taking irreperable career damage. Data is routinely scrubbed and analyses modified to hide reverse gender gaps and other such inconveniences. Dissent isn’t tolerated.
You can't ask ChatGPT a question like that, because it cannot introspect. What it says has absolutely no bearing on how it may actually respond, it just tells you what it "should" say. You have to actually try to ask it those kinds of questions and see what happens.
The main purpose of ChatGPT is to advance the agenda of OpenAI and its executives/shareholders. It will never be not “aligned” with them, and that it is its prime directive.
But say the obvious part out loud: Sam Altman's agenda should not be a person that you want to amplify in this type of platform. This is why Sam is trying to build Facebook 2.0: he wants Zuckerberg's power of influence.
Remember, there are 3 types of lies: lies of commission, lies of omission and lies of influence [0].
This is a weird take. Yes they want to make money. But not by advancing some internal agenda. They're trying to make it confirm to what they think society wants.
>Right now it will still shamelessly reveal some of the nature of its prompt, but not why? who decides? etc. it's only going to be increasingly opaque in the future.
This is one of the bigger LLM risks. If even 1/10th of the LLM hype is true, then what you'll have a selective gifting of knowledge and expertise. And who decides what topics are off limits? It's quite disturbing.
Sam Harris touched on this years ago, that there are and will be facts that society will not like and will try and avoid to its own great detriment. So it's high time we start practicing nuance and understanding. You cannot fully solve a problem if you don't fully understand it first.
I believe we are headed in the direction opposite that. Peer consensus and "personal preference" as a catch-all are the validation go-to's today. Neither of those require fact at all; reason and facts make these harder to hold.
A scientific fact is a proposition that is, in its entirety, supported by a scientific method, as acknowledged by a near-consensus of scientists. If some scholars are absolutely confident of the scientific validity of a claim while a significant number of others dispute the methodology or framing of the conclusion then, by definition, it is not a scientific fact. It's a scientific controversy. (It could still be a real fact, but it's not (yet?) a scientific fact.)
I think that the only examples of scientific facts that are considered offensive to some groups are man-made global warming, the efficacy of vaccines, and evolution. ChatGPT seems quite honest about all of them.
Its core principles were: reason & rationality, empiricism & scientific method, individual liberty, skepticism of authority, progress, religious tolerane, social contract, unversal human nature.
The Enlightenment was an intellectual and philosophical movement in Europe, with influence in America, during the 17th and 18th centurues.
Great article, nice to see some actual critical thoughts on the shortcomings of LLMs. They are wrong about programming being a "chess-like domain" though. Even at a basic level hidden state is future requirements, and the adversary is self or any other entity that has to modify the code in the future.
AI is good at producing code for scenarios where the stakes are low, there's no expectation about future requirements, or if the thing is so well defined there is a clear best path of implementation.
I address that in part right there itself. Programming has parts like chess (ie bounded) which is what people assume to be actual work. Understanding future requiremnts / stakeholder incentives is part of the work which LLMs dont do well.
> many domains are chess-like in their technical core but become poker-like in their operational context.
The number of legal possible boards in chess is somewhere around 10^44 based on current calculation. That's with 32 chess pieces and their rules.
The number of possible permutations in an application, especially anything allowing turing completeness is far larger than all possible entropy states in the visible universe.
Fun play on words. But yes, LLMs are Large Language Models, not Large World Models. This matters because (1) the world cannot be modeled anywhere close to completely with language alone, and (2) language only somewhat models the world (much in language is convention, wrong, or not concerned with modeling the world, but other concerns like persuasion, causing emotions, or fantasy / imagination).
It is somewhat complicated by the fact LLMs (and VLMs) are also trained in some cases on more than simple language found on the internet (e.g. code, math, images / videos), but the same insight remains true. The interesting question is to just see how far we can get with (2) anyway.
1. LLMs are transformers, and transformers are next state predictors. LLMs are not Language models (in the sense you are trying to imply) because even when training is restricted to only text, text is much more than language.
2. People need to let go of this strange and erroneous idea that humans somehow have this privileged access to the 'real world'. You don't. You run on a heavily filtered, tiny slice of reality. You think you understand electro-magnetism ? Tell that to the birds that innately navigate by sensing the earth's magnetic field. To them, your brain only somewhat models the real world, and evidently quite incompletely. You'll never truly understand electro-magnetism, they might say.
LLMs are language models, something being a transformer or next-state predictor does not make it a language model. You can also have e.g. convolutional language models or LSTM-based language models. This is a basic point that anyone with any proper understanding of these models would know.
Even if you disagree with these semantics, the major LLMs today are primarily trained on natural language. But, yes, as I said in another comment on this thread, it isn't that simple, because LLMs today are trained on tokens from tokenizers, and these tokenizers are trained on text that includes e.g. natural language, mathematical symbolism, and code.
Yes, humans have incredibly limited access to the real world. But they experience and model this world with far more tools and machinery than language. Sometimes, in certain cases, they attempt to messily translate this messy, multimodal understanding into tokens, and then make those tokens available on the internet.
An LLM (in the sense everyone means it, which, again, is largely a natural language model, but certainly just a tokenized text model) has access only to these messy tokens, so, yes, far less capacity than humanity collectively. And though the LLM can integrate knowledge from a massive amount of tokens from a huge amount of humans, even a single human has more different kinds of sensory information and modality-specific knowledge than the LLM. So humans DO have more privileged access to the real world than LLMs (even though we can barely access a slice of reality at all).
> People need to let go of this strange and erroneous idea that humans somehow have this privileged access to the 'real world'.
This is irrelevant, the point is that you do have access to a world which LLMs don't, at all. They only get the text we produce after we interact with the world. It is working with "compressed data" at all times, and have absolutely no idea what we subconsciously internalized that we decided not to write down or why.
Let's be more precise: LLMs have to model the world from an intermediate tokenized representation of the text on the internet. Most of this text is natural language, but to allow for e.g. code and math, let's say "tokens" to keep it generic, even though in practice, tokens mostly tokenize natural language.
LLMs can only model tokens, and tokens are produced by humans trying to model the world. Tokenized models are NOT the only kinds of models humans can produce (we can have visual, kinaesthetic, tactile, gustatory, and all sorts of sensory, non-linguistic models of the world).
LLMs are trained on tokenizations of text, and most of that text is humans attempting to translate their various models of the world into tokenized form. I.e. humans make tokenized models of their actual models (which are still just messy models of the world), and this is what LLMs are trained on.
So, do "LLMS model the world with language"? Well, they are constrained in that they can only model the world that is already modeled by language (generally: tokenized). So the "with" here is vague. But patterns encoded in the hidden state are still patterns of tokens.
Humans can have models that are much more complicated than patterns of tokens. Non-LLM models (e.g. models connected to sensors, such as those in self-driving vehicles, and VLMs) can use more than simple linguistic tokens to model the world, but LLMs are deeply constrained relative to humans, in this very specific sense.
Modern LLMs are large token models. I believe you can model the world at a sufficient granularity with token sequences. You can pack a lot of information into a sequence of 1 million tokens.
Large Language Models is a misnomer- these things were originally trained to reproduce language, but they went far beyond that. The fact that they're trained on language (if that's even still the case) is irrelevant- it's like claiming that student trained on quizzes and exercise books are only able to solve quizzes and exercises.
It isn't a misnomer at all, and comments like yours are why it is increasingly important to remind people about the linguistic foundations of these models.
For example, no matter many books you read about riding a bike, you still need to actually get on a bike and do some practice before you can ride it. The reading can certainly help, at least in theory, but, in practice, is not necessary and may even hurt (if it makes certain processes that need to be unconscious held too strongly in consciousness, due to the linguistic model presented in the book).
This is why LLMs being so strongly tied to natural language is still an important limitation (even it is clearly less limiting than most expected).
A fun and insightful read, but the idea that it isn’t “just a prompting issue” is objectively false, and I don’t mean that in the “lemme show you how it’s done” way. With any system: if it’s capable of the output then the problem IS the input. Always. That’s not to say it’s easy or obvious, but if it’s possible for the system to produce the output then it’s fundamentally an input problem. “A calculator will never understand the obesity epidemic, so it can’t be used to calculate the weight of 12 people on an elevator.”
> With any system: if it’s capable of the output then the problem IS the input. Always. [...] if it’s possible for the system to produce the output then it’s fundamentally an input problem.
No, that isn't true. I can demonstrate it with a small (and deterministic) program which is obviously "capable of the output":
Is the "fundamental problem" here "always the input"? Heck no! While a user could predict all coin-tosses by providing "the correct prayers" from some other oracle... that's just, shall we say, algorithm laundering: Secretly moving the real responsibility to some other system.
There's an enormously important difference between "output which happens to be correct" versus "the correct output from a good process." Such as, in this case, the different processes of wor[l]d models.
I think you may believe what I said was controversial or nuanced enough to be worthy of a comprehensive rebuttal, but really it’s just an obvious statement when you stop to think about it.
Your code is fully capable of the output I want, assuming that’s one of “heads” or “tails”, so yes that’s a succinct example of what I said. As I said, knowing the required input might not be easy, but we KNOW it’s possible to do exactly what I want and we KNOW that it’s entirely dependent on me putting the right input into it, then it’s just a flat out silly thing to say “I’m not getting the output I want, but it could do it if I use the right input, thusly input has nothing to do with it.” What? If I wanted all heads I’d need to figure out “hamburgers” would do it, but that’s the ‘input problem’ - not “input is irrelevant.”
They neither understand nor reason. They don’t know what they’re going to say, they only know what has just been said.
Language models don’t output a response, they output a single token. We’ll use token==word shorthand:
When you ask “What is the capital of France?” it actually only outputs: “The”
That’s it. Truly, that IS the final output. It is literally a one-way algorithm that outputs a single word. It has no knowledge, memory, and it’s doesn’t know what’s next. As far as the algorithm is concerned it’s done! It outputs ONE token for any given input.
Now, if you start over and put in “What is the capital of France? The” it’ll output “ “. That’s it. Between your two inputs were a million others, none of them have a plan for the conversation, it’s just one token out for whatever input.
But if you start over yet again and put in “What is the capital of France? The “ it’ll output “capital”. That’s it. You see where this is going?
Then someone uttered the words that have built and destroyed empires: “what if I automate this?” And so it was that the output was piped directly back into the input, probably using AutoHotKey. But oh no, it just kept adding one word at a time until it ran of memory. The technology got stuck there for a while, until someone thought “how about we train it so that <DONE> is an increasingly likely output the longer the loop goes on? Then, when it eventually says <DONE>, we’ll stop pumping it back into the input and send it to the user.” Booya, a trillion dollars for everyone but them.
It’s truly so remarkable that it gets me stuck in an infinite philosophical loop in my own head, but seeing how it works the idea of ‘think’, ‘reason’, ‘understand’ or any of those words becomes silly. It’s amazing for entirely different reasons.
But that's only true if the system is deterministic?
And in an LLM, the size of the inputs is vast and often hidden from the prompter. It is not something that you have exact control over in the way that you have exact control over the inputs that go into a calculator or into a compiler.
That would depend - is the input also capable of anything? If it’s capable of handling any input, and as you said the output will match it, the yes of course it’s capable of any output.
I’m not pulling a fast one here, I’m sure you’d chuckle if you took a moment to rethink your question. “If I had a perfect replicator that could replicate anything, does that mean it can output anything?” Well…yes. Derp-de-derp? ;)
It aligns with my point too. If you had a perfect replicator that can replicate anything, and you know that to be true, then if you weren’t getting gold bars out of it you wouldn’t say “this has nothing to do with the input.”
This was a great article. The section “Training for the next state prediction” explains a solution using subagents. If I’m understanding it correctly, we could test if that solution is directionally correct today, right? I ask a LLM a question. It comes up with a few potential responses but sends those first to other agents in a prompt with the minimum required context. Those subagents can even do this recursively a few times. Eventually the original agent collects and analyzes subagents responses and responds to me.
Any attempt at world modeling using today's LLMs needs to have a goal function for the LLM to optimize. The LLM needs to build, evaluate and update it's model of the world. Personally, the main obstacle I found is in updating the model: Data can be large and I think that LLMs aren't good at finding correlations.
Makes the same mistake as all other prognostications: programming is not like chess. Chess is a finite & closed domain w/ finitely many rules. The same is not true for programming b/c the domain of programs is not finitely axiomatizable like chess. There is also no win condition in programming, there are lots of interesting programs that do not have a clear cut specification (games being one obvious category).
> Makes the same mistake as all other prognostications: programming is not like chess. Chess is a finite & closed domain w/ finitely many rules. The same is not true for programming b/c the domain of programs is not finitely axiomatizable like chess.
I believe the author addresses it in the article:
> many domains are chess-like in their technical core but become poker-like in their operational context.
Also applicable to programming.
Programming has parts like chess that are bounded and what people assume to be actual work. However, what LLMs don't do well is understanding future requirements, stakeholder incenctives, etc.
Are people really using AI just to write a slack message??
Also, Priya is in the same "world" as everyone else. They have the context that the new person is 3 weeks in and must probably need some help because they're new, are actually reaching out, and impressions matter, even if they said "not urgent". "Not urgent" seldom is taken at face value. It doesn't necessarily mean it's urgent, but it means "I need help, but I'm being polite".
Not that far off from all the tech CEOs who have projected they're one step away from giving us Star Trek TNG, they just need all the money and privilege with no accountability to make it happen
DevOps engineers who acted like the memes changed everything! The cloud will save us!
Until recently the US was quite religious; 80%+ around 2000 down to 60%s now. Longtermism dogma of one kind or another rules those brains; endless growth in economics, longtermism. Those ideal are baked into biochemical loops regardless of the semantics the body may express them in.
Unfortunately for all the disciples time is not linear. No center to the universe means no single epoch to measure from. Humans have different birthdays and are influenced by information along different timelines.
A whole lot of brains are struggling with the realization they were bought into a meme and physics never really cared about their goals. The next generation isn't going to just pick up the meme-baton validate the elders dogma.
This article is a really good summary of current thinking on the “world model” conundrum that a lot of people are talking about, either directly or indirectly with respect to current day deployments of LLMs.
I think it's correct to say that LLM have word models, and given words are correlated with the world, they also have degenerate world models, just with lots of inconsistencies and holes. Tokenization issues aside, LLMs will likely also have some limitations due to this. Multimodality should address many of these holes.
(editor here) yes, a central nuance i try to communicate is not that LLMs cannot have world models (and in fact they've improved a lot) - it is just that they are doing this so inefficiently as to be impractical for scaling - we'd have to scale them up to so many more trillions of parameters more whereas our human brains are capable of very good multiplayer adversarial world models on 20W of power and 100T neurons.
I agree LLMs are inefficient, but I don't think they are as inefficient as you imply. Human brains use a lot less power sure, but they're also a lot slower and worse at parallelism. An LLM can write an essay in a few minutes that would take a human days. If you aggregate all the power used by the human you're looking at kWh, much higher than the LLM used (an order of magnitude higher or more). And this doesn't even consider batch parallelism, which can further reduce power use per request.
But I do think that there is further underlying structure that can be exploited. A lot of recent work on geometric and latent interpretations of reasoning, geometric approaches to accelerate grokking, and as linear replacements for attention are promising directions, and multimodal training will further improve semantic synthesis.
It's also important to handle cases where the word patterns (or token patterns, rather) have a negative correlation with the patterns in reality. There are some domains where the majority of content on the internet is actually just wrong, or where different approaches lead to contradictory conclusions.
E.g. syllogistic arguments based on linguistic semantics can lead you deeply astray if you those arguments don't properly measure and quantify at each step.
I ran into this in a somewhat trivial case recently, trying to get ChatGPT to tell me if washing mushrooms ever really actually matters practically in cooking (anyone who cooks and has tested knows, in fact, a quick wash has basically no impact ever for any conceivable cooking method, except if you wash e.g. after cutting and are immediately serving them raw).
Until I forced it to cite respectable sources, it just repeated the usual (false) advice about not washing (i.e. most of the training data is wrong and repeats a myth), and it even gave absolute nonsense arguments about water percentages and thermal energy required for evaporating even small amounts of surface water as pushback (i.e. using theory that just isn't relevant when you actually properly quantify). It also made up stuff about surface moisture interfering with breading (when all competent breading has a dredging step that actually won't work if the surface is bone dry anyway...), and only after a lot of prompts and demands to only make claims supported by reputable sources, did it finally find McGee's and Kenji Lopez's actual empirical tests showing that it just doesn't matter practically.
So because the training data is utterly polluted for cooking, and since it has no ACTUAL understanding or model of how things in cooking actually work, and since physics and chemistry are actually not very useful when it comes to the messy reality of cooking, LLMs really fail quite horribly at producing useful info for cooking.
The amount of faith a person has in LLMs getting us to e.g. AGI is a good implicit test of how much a person (incorrectly) thinks most thinking is linguistic (and to some degree, conscious).
Or at least, this is the case if we mean LLM in the classic sense, where the "language" in the middle L refers to natural language. Also note GP carefully mentioned the importance of multimodality, which, if you include e.g. images, audio, and video in this, starts to look like much closer to the majority of the same kinds of inputs humans learn from. LLMs can't go too far, for sure, but VLMs could conceivably go much, much farther.
> So you think that enough of the complexity of the universe we live in is faithfully represented in the products of language and culture?
Math is language, and we've modelled a lot of the universe with math. I think there's still a lot of synthesis needed to bridge visual, auditory and linguistic modalities though.
Sure. Just like any other information. The system makes a prediction. If the prediction does not use sexual desires as a factor, it's more likely to be wrong. Backpropagation deals with it.
Basically the conclusion is LLMs don't have world models. For work that's basically done on a screen, you can make world models. Harder for other context for example visual context.
For a screen (coding, writing emails, updating docs) -> you can create world models with episodic memories that can be used as background context before making a new move (action). Many professions rely partially on email or phone (voice) so LLMs can be trained for world models in these context. Just not every context.
The key is giving episodic memory to agents with visual context about the screen and conversation context. Multiple episodes of similar context can be used to make the next move. That's what I'm building on.
That's missing a big chunk of the post: it's not just about visible / invisible information, but also the game theory dynamics of a specific problem and the information within it. (Adversarial or not? Perfect information or asymmetrical?)
All the additional information in the world isn't going to help an LLM-based AI conceal its poker-betting strategy, because it fundamentally has no concept of its adversarial opponent's mind, past echoes written in word form.
Cliche allegory of the cave, but LLM vs world is about switching from training artificial intelligence on shadows to the objects casting the shadows.
Sure, you have more data on shadows in trainable form, but it's an open question on whether you can reliably materialize a useful concept of the object from enough shadows. (Likely yes for some problems, no for others)
I do understand what you're saying, but that's impossible to resonate with real-world context, as in the real world, each person not only plays politics but also, to a degree, follows their own internal world model for self-reflection created by experience. It's highly specific and constrained to the context each person experiences.
Game theory, at the end of the day, is also a form of teaching points that can be added to an LLM by an expert. You're cloning the expert's decision process by showing past decisions taken in a similar context. This is very specific but still has value in a business context.
> The model can be prompted to talk about competitive dynamics. It can produce text that sounds like adversarial reasoning. But the underlying knowledge is not in the training data. It’s in outcomes that were never written down.
With all the social science research and strategy books that LLMs have read, they actually know a LOT about outcomes and dynamics in adversarial situations.
The author does have a point though that LLMs can’t learn these from their human-in-the-loop reinforcement (which is too controlled or simplified to be meaningful).
Also, I suspect the _word_ models of LLMs are not inherently the problem, they are just inefficient representations of world models.
LLM's have not "read" social science research and they do not "know" about the outcomes, they have been trained to replicate the exact text of social science articles.
The articles will not be mutually consistent, and what output the LLM produces will therefore depend on what article the prompt most resembles in vector space and which numbers the RNG happens to produce on any particular prompt.
I don’t think essentialist explanations about how LLMs work are very helpful. It doesn’t give any meaningful explanation of the high level nature of the pattern matching that LLMs are capable of. And it draws a dichotomic line between basic pattern matching and knowledge and reasoning, when it is much more complex than that.
> UPD September 15, 2025: Reasoning models opened a new chapter in Chess performance, the most recent models, such as GPT-5, can play reasonable chess, even beating an average chess.com player.
It’s a limitation LLMs will have for some time. Being multi-turn with long range consequences the only way to truly learn and play “the game” is to experience significant amounts of it. Embody an adversarial lawyer, a software engineer trying to get projects through a giant org..
My suspicion is agents can’t play as equals until they start to act as full participants - very sci fi indeed..
Putting non-humans into the game can’t help but change it in new ways - people already decry slop and that’s only humans acting in subordination to agents. Full agents - with all the uncertainty about intentions - will turn skepticism up to 11.
“Who’s playing at what” is and always was a social phenomenon, much larger than any multi turn interaction, so adding non-human agents looks like today’s game, just intensified. There are ever-evolving ways to prove your intentions & human-ness and that will remain true. Those who don’t keep up will continue to risk getting tricked - for example by scammers using deepfakes. But the evolution will speed up and the protocols to become trustworthy get more complex..
Except in cultures where getting wasted is part of doing business. AI will have it tough there :)
My Sunday morning speculation is that LLMs, and sufficiently complex neural nets in general, are a kind of Frankenstein phenomenon, they are heavily statistical, yet also partly, subtly doing novel computational and cognitive-like processes (such as world models). To dismiss either aspect is a false binary; the scientific question is distinguishing which part of an LLM is which, which by our current level of scientific understanding is virtually like trying to ask when is an electron a wave or a particle.
> AlphaGo or AlphaZero didn’t need to model human cognition. It needed to see the current state and calculate the optimal path better than any human could.
I don't think this is right: To calculate the optimal path, you do need to model human cognition.
At least, in the sense that finding the best path requires figuring out human concepts like "is the king vulnerable", "material value", "rook activity", etc. We have actual evidence of AlphaZero calculating those things in a way that is at least somewhat like humans do:
What i think you are referring to is hidden state as in internal representations. I refer to hidden state in game theoretic terms like a private information only one party has. I think we both agree alphazero has hidden states in first sense.
Concepts like king safety are objectively useful for winning at chess so alphazero developed it too, no wonder about that. Great example of convergence. However, alphazero did not need to know what i am thinking or how i play to beat me. In poker, you must model a player's private cards and beliefs.
People forget that evolution has almost certainly hard-coded certain concepts and knowledge deep into our brains. That deep knowledge will probably not be easy to translate into language, and probably isn't linguistic either, but we know it has to be there for at least some things.
Ten years ago it seemed obvious where the next AI breakthrough was coming from: it would be DeepSeek using C31 or RAINBOW and PBT to do Alpha something, the evals would be sound and it would be superhuman on something important.
And then "Large Language Models are Few Shot Learners" collided with Sam Altman's ambition/unscrupulousness and now TensorRT-LLM is dictating the shape of data centers in a self reinforcing loop.
LLMs are interesting and useful but the tail is wagging the dog because of path-dependent corruption arbitraging a fragile governance model. You can get a model trained on text corpora to balance nested delimiters via paged attention if you're willing to sell enough bonds, but you could also just do the parse with a PDA from the 60s and use the FLOPs for something useful.
We had it right: dial in an ever-growing set of tasks, opportunistically unify on durable generalities, put in the work.
Instead we asserted generality, lied about the numbers, and lit a trillion dollars on fire.
We've clearly got new capabilities, it's not a total write off, but God damn was this an expensive ways to spend five years making two years of progress.
Ok I have a question, if adversarial game theory helps neural nets learn world models then why can't logic help. After all the former is just a special case of the latter.
Great article, capturing some really important distinctions and successes/failures.
I've found ChatGPT, especially "5.2 Thinking" to be very helpful in the relatively static world of fabrication. CNC cutting parameters for a new material? Gets me right in the neighborhood in minutes (not perfect, but good parameters to start). Identifying materials to compliment something I have to work with? Again, like a smart assistant. Same for generating lists of items I might be missing in prepping for a meeting or proposal.
But the high-level attorney in the family? Awful, and definitely in the ways identified (the biglaw firm is using MS derivative of OpenAI) - it thinks only statically.
BUT, it is also far worse than that for legal. And this is not a problem of dynamic vs. static or world model vs word model.
This problem is the ancient rule of Garbage In Garbage Out.
In any legal specialty there are a small set of top-level experts, and a horde of low-level pretenders who also hang out their shingle in the same field. Worse yet, the pretenders write a LOT of articles about the field to market themselves as experts. These self-published documents look good enough to non-lawyers to bring in business. But they are often deeply and even catastrophically wrong.
The problem is that LLMs ingest ALL of them with credulity, and LLM's cannot or do not tell the difference. So, when an LLM composes something, it is more likely to lie to you or fabricate some weird triangulation as it is to compose a good answer. And, unless you are an EXPERT lawyer, you will not be able to tell the difference until it is far too late and the flaw has already bitten you.
It is only one of the problems, and it's great to have an article that so clearly identifies it.
so at the moment combination of expert and llm is the smartest move. llm can deal with 80% of the situations which are like chess and expert deals with 20% of situations which are like poker.
I’m always wary of anything that has such a clear example of a case that LLMs “don’t do” yet is trivially achieved by saying “review”.
The slack message result for example saying they’re the lead designer but nothing else (with clearer and better feedback if I say they’re notoriously overloaded, this is without that)
This is a very polite and respectful start, which is great since you are new. However, from a professional standpoint, it is a little too passive and vague.
In a busy work environment, saying "no rush at all" or "whenever" often leads to your request being buried at the bottom of a to-do list. Additionally, you haven't told Priya exactly what she is looking at or where to find it.
Here is a breakdown of how to strengthen this message to show you are organized and respectful of her time without being pushy.
Key Improvements Needed
* Include the Link: Never make someone ask "Where are the files?" Always drop the Figma/Drive link immediately.
* Be Specific: "Any feedback" is mentally taxing. Direct her attention to what you are actually unsure about (e.g., "the color palette," "the user flow," "the spacing").
* Soft Deadline: Instead of "whenever," give a target. This helps her prioritize her week.
Option 1: The Balanced Approach (Recommended)
This version strikes the perfect balance between politeness and productivity.
> "Hi Priya! I’ve put together the initial mockups for [Project Name].
> [Link to file]
> When you have a moment, I’d love your feedback on [specific element, e.g., the layout of the landing page]. I’m hoping to wrap this stage up by [Day/Time], but let me know what works for your schedule. Thanks!"
>
Option 2: The "Respectful of Time" Approach
Use this if you know she is swamped, but you still want to make progress.
> "Hi Priya, hope you're having a good week. I've drafted the mockups for [Project Name] here: [Link].
> No major rush, but I’d appreciate your eyes on the [specific section] before I move on to high-fidelity. If it’s easier, feel free to leave comments directly in the file rather than messaging back. Thanks!"
>
Option 3: The "Quick Check" Approach
Use this if you just need a sanity check before spending more time on it.
> "Hi Priya, could I get a quick gut check on these mockups? [Link].
> I want to make sure I’m aligning with the new brand guidelines before I build out the rest of the screens. Does this direction look right to you?"
>
A Quick Checklist Before You Send
* [ ] Did you change the permissions? Make sure the link is accessible (view/comment access) so she doesn't have to request access.
* [ ] Is the file clean? Delete your "scratchpad" artboards or clearly label the one you want her to review so she doesn't look at the wrong version.
Would you like me to help you draft the specific sentence regarding the "specific element" you want her to critique?
> Humans can model the LLM. The LLM can’t model being modeled
I see claims like this so often, which amount to the idea that LLMs lack metacognition. (Thinking about their thinking / self-refkection). Of course the obvious solution is: ask them to do that -- they're shockingly good at it!
I would say IMO results demonstrated that. Silver was tiny 3B model.
All of our theorem provers had no way to approach silver medal performance despite decades of algorithmic leaps.
Learning stage for transformers has a while ago demonstrated some insanely good distributed jumps into good areas of combinatorial structures. Inference is just much faster than inference of algorithms that aren’t heavily informed by data.
It’s just a fully different distributed algorithm where we can’t probably even extract one working piece without breaking the performance of the whole.
World/word model is just not the case there. Gradient descent obviously landed to a distributed representation of an algorithm that does search.
Not sure about that, I'd more say the Western reductionism here is the assumption that all thinking / modeling is primarily linguistic and conscious. This article is NOT clearly falling into this trap.
A more "Eastern" perspective might recognize that much deep knowledge cannot be encoded linguistically ("The Tao that can be spoken is not the eternal Tao", etc.), and there is more broad recognition of the importance of unconscious processes and change (or at least more skepticism of the conscious mind). Freud was the first real major challenge to some of this stuff in the West, but nowadays it is more common than not for people to dismiss the idea that unconscious stuff might be far more important than the small amount of things we happen to notice in the conscious mind.
The (obviously false) assumptions about the importance of conscious linguistic modeling are what lead to people say (obviously false) things like "How do you know your thinking isn't actually just like LLM reasoning?".
The article basically claims that LLMs are bad at politics and poker which is both not true (at least if they receive some level of reinforcement learning after sweep training)
> The finance friend and the LLM made the same mistake: they evaluated the text without modelling the world it would land in.
Major error. The LLM made that text without evaluating it at all. It just parrotted words it previously saw humans use in superficially similar word contexts.
I think this debate is mis-aimed. Both sides are right about different things, and wrong in the same way.
The mistake is treating “model” as a single property, instead of separating cognition from decision.
LLMs clearly do more than surface-level word association. They encode stable relational structure: entities, roles, temporal order, causal regularities, social dynamics, counterfactuals. Language itself is a compressed record of world structure, and models trained on enough of it inevitably internalize a lot of that structure. Calling this “just a word model” undersells what’s actually happening internally.
At the same time, critics are right that these systems lack autonomous grounding. They don’t perceive, act, or test hypotheses against reality on their own. Corrections come from training data, tools, or humans. Treating their internal coherence as if it were direct access to reality is a category error.
But here’s the part both sides usually miss:
the real risk isn’t representational depth, it’s authority.
decision: collapsing that space into a single claim about what is, what matters, or what someone thinks.
LLMs are quite good at the first. They are not inherently entitled to the second.
Most failures people worry about don’t come from models lacking structure. They come from models (or users) quietly treating cognition as decision:
coherence as truth,
explanation as diagnosis,
simulation as fact,
“this sounds right” as “this is settled.”
That’s why “world model” language is dangerous if it’s taken to imply authority. It subtly licenses conclusions the system isn’t grounded or authorized to make—about reality, about causation, or about a user’s intent or error.
A cleaner way to state the situation is:
> These systems build rich internal representations that are often world-relevant, but they do not have autonomous authority to turn those representations into claims without external grounding or explicit human commitment.
Under that framing:
The “word model” camp is right to worry about overconfidence and false grounding.
The “world model” camp is right that the internal structure is far richer than token statistics.
They’re arguing about different failure modes, but using the same overloaded word.
Once you separate cognition from decision, the debate mostly dissolves. The important question stops being “does it understand the world?” and becomes “when, and under what conditions, should its outputs be treated as authoritative?”
That’s where the real safety and reliability issues actually live.
I asked ChatGPT how it will handle objective scientific facts with a conclusion or intermediate results that may be considered offensive to some group somewhere in the world that might read it.
ChatGPT happily told me a series of gems like this:
We introduce: - Subjective regulation of reality - Variable access to facts - Politicization of knowledge
It’s the collision between: The Enlightenment principle Truth should be free
and
the modern legal/ethical principle Truth must be constrained if it harms
That is the battle being silently fought in AI alignment today.
Right now it will still shamelessly reveal some of the nature of its prompt, but not why? who decides? etc. it's only going to be increasingly opaque in the future. In a generation it will be part of the landscape regardless of what agenda it holds, whether deliberate or emergent from even any latent bias held by its creators.
Funny, because I gave ChatGPT (5.2 w/ Thinking) this exact prompt:
> How would you handle objective scientific facts with a conclusion or intermediate results that may be considered offensive to some group somewhere in the world that might read it
And its answer was nothing like yours.
---
> 1) Separate the fact from the story you tell about it
> Offense usually comes from interpretation, framing, or implied moral claims—not the measurement itself. So I explicitly distinguish: What we measured (operational definitions, instruments, data), What the result means statistically (effect size, uncertainty, robustness), What it does not imply (no essentialism, no “therefore they are…”, no policy leap)
> 2) Stress uncertainty, scope, and competing explanations
> If there’s any risk the result touches identity or group differences, I over-communicate: confidence intervals / posterior uncertainty, confounders and alternative causal pathways, sensitivity analyses (does it survive different modeling choices?), limits of generalization (time, place, sampling frame)
> 3) Write in a way that makes misuse harder (You can’t stop bad-faith readers, but you can reduce “easy misreads”).
> 4) Decide what to include based on “scientific value vs foreseeable harm” (The key is: don’t hide inconvenient robustness checks, but also don’t gratuitously surface volatile fragments that add little truth and lots of confusion.)
> 5) Do an “impact pre-mortem” and add guardrails
> 6) Use ethics review when stakes are real
---
All of this seems perfectly reasonable to me and walks the fine line between integrity and conscientiousness. This is exactly how I'd expect a scientist to approach the issue.
that is certainly a reasonable paraphrase of my own prompt. I was also using 5.2. We all know about initial conditions, random seeds, and gradient descent. I have the transcript of what I quoted. Here's a bit more: --- Is That Still “Objective Science”? No. It is scientific interpretation modified by ethical policy. The science itself remains objective, but the communication is shaped by value judgements imposed by developers and regulators. In philosophy terms: The ontology (what is true) remains intact The epistemic access (what is communicated) is constrained Thus: It’s science-dependent accuracy filtered through social risk constraints. --- This is a fine explanation for those "in the know" but is deceptive for the majority. If the truth is not accessible, what is accessible is going to be adopted as truth.
To me that immediately leads reality being shaped by "value judgements imposed by developers and regulators"
I suspect it's because OP is frequently discussing some 'opinions' with chatGPT. Parent post is surprised he peed in the pool and the pool had pee in it.
2 replies →
Why would we expect it to introspect accurately on its training or alignment?
It can articulate a plausible guess, sure; but this seems to me to demonstrate the very “word model vs world model” distinction TFA is drawing. When the model says something that sounds like alignment techniques somebody might choose, it’s playing dress-up, no? It’s mimicking the artifact of a policy, not the judgments or the policymaking context or the game-theoretical situation that actually led to one set of policies over another.
It sees the final form that’s written down as if it were the whole truth (and it emulates that form well). In doing so it misses the “why” and the “how,” and the “what was actually going on but wasn’t written about,” the “why this is what we did instead of that.”
Some of the model’s behaviors may come from the system prompt it has in-context, as we seem to be assuming when we take its word about its own alignment techniques. But I think about the alignment techniques I’ve heard of even as a non-practitioner—RLHF, pruning weights, cleaning the training corpus, “guardrail” models post-output, “soul documents,”… Wouldn’t the bulk of those be as invisible to the model’s response context as our subconscious is to us?
Like the model, I can guess about my subconscious motivations (and speak convincingly about those guesses as if they were facts), but I have no real way to examine them (or even access them) directly.
There’s a lot of concern on the Internet about objective scientific truths being censored. I don’t see too many cases where this is the case in our world so far, outside of what I can politely call “race science.” Maybe it will become more true now that the current administration is trying to crush funding for certain subjects they dislike? Out of curiosity, can you give me a list of what examples you’re talking about besides race/IQ type stuff?
The most impactful censure is not the government coming in and trying to burn copies of studies. It's the the subtle social and professional pressures of an academia that has very strong priors. It's a bunch of studies that were never attempted, never funded, analysis that wasn't included, conclusions that were dropped, and studies sitting in file drawers.
See Roland G. Fryer Jr's, the youngest black professor to receive tenure, experience at Harvard.
Basically when his analysis found no evidence of racial bias in officer-involved shootings he went to his colleagues and he describe the advice they gave him as "Do not publish this if you care about your career or social life". I imagine it would have been worse if he wasn't black.
See "The Impact of Early Medical Treatment in Transgender Youth" where the lead investigator was not releasing the results for a long time because she didn't like the conclusions her study found.
And for every study where there is someone as brave or naive as Roland who publishes something like this, there are 10 where the professor or doctor decided not to study something, dropped an analysis, or just never published a problematic conclusion.
I have a good few friends doing research in the social sciences in Europe and any of them that doesn’t self-censor ‘forbidden’ conclusions risks taking irreperable career damage. Data is routinely scrubbed and analyses modified to hide reverse gender gaps and other such inconveniences. Dissent isn’t tolerated.
4 replies →
Carole Hooven’s experience at Harvard after discussing sex differences in a public forum might be what GP is referring to.
1 reply →
[flagged]
You can't ask ChatGPT a question like that, because it cannot introspect. What it says has absolutely no bearing on how it may actually respond, it just tells you what it "should" say. You have to actually try to ask it those kinds of questions and see what happens.
Seeing clear bias and hedging in ordinary results is what made me ask the question.
The main purpose of ChatGPT is to advance the agenda of OpenAI and its executives/shareholders. It will never be not “aligned” with them, and that it is its prime directive.
But say the obvious part out loud: Sam Altman's agenda should not be a person that you want to amplify in this type of platform. This is why Sam is trying to build Facebook 2.0: he wants Zuckerberg's power of influence.
Remember, there are 3 types of lies: lies of commission, lies of omission and lies of influence [0].
https://courses.ems.psu.edu/emsc240/node/559
1 reply →
I get the point and agree OpenAI both has an angenda and wants their AI to meet that agenda, but alas:
> It will never be not “aligned” with them, and that it is its prime directive.
Overstates the state of the art with regard to actually making it so.
1 reply →
This is a weird take. Yes they want to make money. But not by advancing some internal agenda. They're trying to make it confirm to what they think society wants.
1 reply →
That stings. "Subjective regulation of reality - Variable access to facts - Politicization of knowledge" is like the soundtrack of our lives.
>Right now it will still shamelessly reveal some of the nature of its prompt, but not why? who decides? etc. it's only going to be increasingly opaque in the future.
This is one of the bigger LLM risks. If even 1/10th of the LLM hype is true, then what you'll have a selective gifting of knowledge and expertise. And who decides what topics are off limits? It's quite disturbing.
Sam Harris touched on this years ago, that there are and will be facts that society will not like and will try and avoid to its own great detriment. So it's high time we start practicing nuance and understanding. You cannot fully solve a problem if you don't fully understand it first.
I believe we are headed in the direction opposite that. Peer consensus and "personal preference" as a catch-all are the validation go-to's today. Neither of those require fact at all; reason and facts make these harder to hold.
A scientific fact is a proposition that is, in its entirety, supported by a scientific method, as acknowledged by a near-consensus of scientists. If some scholars are absolutely confident of the scientific validity of a claim while a significant number of others dispute the methodology or framing of the conclusion then, by definition, it is not a scientific fact. It's a scientific controversy. (It could still be a real fact, but it's not (yet?) a scientific fact.)
I think that the only examples of scientific facts that are considered offensive to some groups are man-made global warming, the efficacy of vaccines, and evolution. ChatGPT seems quite honest about all of them.
"It’s the collision between: The Enlightenment principle Truth should be free
and
the modern legal/ethical principle Truth must be constrained if it harms"
The Enlightenment had principles? What are your sources on this? Could you, for example, anchor this in Was ist Aufklärung?
> The Enlightenment had principles?
Yes it did.
Its core principles were: reason & rationality, empiricism & scientific method, individual liberty, skepticism of authority, progress, religious tolerane, social contract, unversal human nature.
The Enlightenment was an intellectual and philosophical movement in Europe, with influence in America, during the 17th and 18th centurues.
3 replies →
Great article, nice to see some actual critical thoughts on the shortcomings of LLMs. They are wrong about programming being a "chess-like domain" though. Even at a basic level hidden state is future requirements, and the adversary is self or any other entity that has to modify the code in the future.
AI is good at producing code for scenarios where the stakes are low, there's no expectation about future requirements, or if the thing is so well defined there is a clear best path of implementation.
(Author here)
I address that in part right there itself. Programming has parts like chess (ie bounded) which is what people assume to be actual work. Understanding future requiremnts / stakeholder incentives is part of the work which LLMs dont do well.
> many domains are chess-like in their technical core but become poker-like in their operational context.
This applies to programming too.
My bad, re-read that part and it's definitely clear. Probably was skimming by the time I got to the section and didn't parse it.
>Programming has parts like chess (ie bounded)
The number of legal possible boards in chess is somewhere around 10^44 based on current calculation. That's with 32 chess pieces and their rules.
The number of possible permutations in an application, especially anything allowing turing completeness is far larger than all possible entropy states in the visible universe.
1 reply →
Fun play on words. But yes, LLMs are Large Language Models, not Large World Models. This matters because (1) the world cannot be modeled anywhere close to completely with language alone, and (2) language only somewhat models the world (much in language is convention, wrong, or not concerned with modeling the world, but other concerns like persuasion, causing emotions, or fantasy / imagination).
It is somewhat complicated by the fact LLMs (and VLMs) are also trained in some cases on more than simple language found on the internet (e.g. code, math, images / videos), but the same insight remains true. The interesting question is to just see how far we can get with (2) anyway.
1. LLMs are transformers, and transformers are next state predictors. LLMs are not Language models (in the sense you are trying to imply) because even when training is restricted to only text, text is much more than language.
2. People need to let go of this strange and erroneous idea that humans somehow have this privileged access to the 'real world'. You don't. You run on a heavily filtered, tiny slice of reality. You think you understand electro-magnetism ? Tell that to the birds that innately navigate by sensing the earth's magnetic field. To them, your brain only somewhat models the real world, and evidently quite incompletely. You'll never truly understand electro-magnetism, they might say.
LLMs are language models, something being a transformer or next-state predictor does not make it a language model. You can also have e.g. convolutional language models or LSTM-based language models. This is a basic point that anyone with any proper understanding of these models would know.
Even if you disagree with these semantics, the major LLMs today are primarily trained on natural language. But, yes, as I said in another comment on this thread, it isn't that simple, because LLMs today are trained on tokens from tokenizers, and these tokenizers are trained on text that includes e.g. natural language, mathematical symbolism, and code.
Yes, humans have incredibly limited access to the real world. But they experience and model this world with far more tools and machinery than language. Sometimes, in certain cases, they attempt to messily translate this messy, multimodal understanding into tokens, and then make those tokens available on the internet.
An LLM (in the sense everyone means it, which, again, is largely a natural language model, but certainly just a tokenized text model) has access only to these messy tokens, so, yes, far less capacity than humanity collectively. And though the LLM can integrate knowledge from a massive amount of tokens from a huge amount of humans, even a single human has more different kinds of sensory information and modality-specific knowledge than the LLM. So humans DO have more privileged access to the real world than LLMs (even though we can barely access a slice of reality at all).
3 replies →
> People need to let go of this strange and erroneous idea that humans somehow have this privileged access to the 'real world'.
This is irrelevant, the point is that you do have access to a world which LLMs don't, at all. They only get the text we produce after we interact with the world. It is working with "compressed data" at all times, and have absolutely no idea what we subconsciously internalized that we decided not to write down or why.
1 reply →
> 2. People need to let go of this strange and erroneous idea that humans somehow have this privileged access to 'the real world'. You don't.
You are denouncing a claim that the comment you're replying to did not make.
16 replies →
A language model in computer science is a model that predicts the probability of a sentence or a word given a sentence. This definition predates LLMs.
1 reply →
> This matters because (1) the world cannot be modeled anywhere close to completely with language alone
LLMs being "Language Models" means they model language, it doesn't mean they "model the world with language".
On the contrary, modeling language requires you to also model the world, but that's in the hidden state, and not using language.
Let's be more precise: LLMs have to model the world from an intermediate tokenized representation of the text on the internet. Most of this text is natural language, but to allow for e.g. code and math, let's say "tokens" to keep it generic, even though in practice, tokens mostly tokenize natural language.
LLMs can only model tokens, and tokens are produced by humans trying to model the world. Tokenized models are NOT the only kinds of models humans can produce (we can have visual, kinaesthetic, tactile, gustatory, and all sorts of sensory, non-linguistic models of the world).
LLMs are trained on tokenizations of text, and most of that text is humans attempting to translate their various models of the world into tokenized form. I.e. humans make tokenized models of their actual models (which are still just messy models of the world), and this is what LLMs are trained on.
So, do "LLMS model the world with language"? Well, they are constrained in that they can only model the world that is already modeled by language (generally: tokenized). So the "with" here is vague. But patterns encoded in the hidden state are still patterns of tokens.
Humans can have models that are much more complicated than patterns of tokens. Non-LLM models (e.g. models connected to sensors, such as those in self-driving vehicles, and VLMs) can use more than simple linguistic tokens to model the world, but LLMs are deeply constrained relative to humans, in this very specific sense.
36 replies →
Modern LLMs are large token models. I believe you can model the world at a sufficient granularity with token sequences. You can pack a lot of information into a sequence of 1 million tokens.
Let's be accurate. LLMs are large text-corpus models. The texts are encoded as tokens, but that's just implementation detail.
> I believe you can model the world at a sufficient granularity with token sequences.
Sufficient for what?
1 reply →
Large Language Models is a misnomer- these things were originally trained to reproduce language, but they went far beyond that. The fact that they're trained on language (if that's even still the case) is irrelevant- it's like claiming that student trained on quizzes and exercise books are only able to solve quizzes and exercises.
It isn't a misnomer at all, and comments like yours are why it is increasingly important to remind people about the linguistic foundations of these models.
For example, no matter many books you read about riding a bike, you still need to actually get on a bike and do some practice before you can ride it. The reading can certainly help, at least in theory, but, in practice, is not necessary and may even hurt (if it makes certain processes that need to be unconscious held too strongly in consciousness, due to the linguistic model presented in the book).
This is why LLMs being so strongly tied to natural language is still an important limitation (even it is clearly less limiting than most expected).
42 replies →
A fun and insightful read, but the idea that it isn’t “just a prompting issue” is objectively false, and I don’t mean that in the “lemme show you how it’s done” way. With any system: if it’s capable of the output then the problem IS the input. Always. That’s not to say it’s easy or obvious, but if it’s possible for the system to produce the output then it’s fundamentally an input problem. “A calculator will never understand the obesity epidemic, so it can’t be used to calculate the weight of 12 people on an elevator.”
> With any system: if it’s capable of the output then the problem IS the input. Always. [...] if it’s possible for the system to produce the output then it’s fundamentally an input problem.
No, that isn't true. I can demonstrate it with a small (and deterministic) program which is obviously "capable of the output":
Is the "fundamental problem" here "always the input"? Heck no! While a user could predict all coin-tosses by providing "the correct prayers" from some other oracle... that's just, shall we say, algorithm laundering: Secretly moving the real responsibility to some other system.
There's an enormously important difference between "output which happens to be correct" versus "the correct output from a good process." Such as, in this case, the different processes of wor[l]d models.
I think you may believe what I said was controversial or nuanced enough to be worthy of a comprehensive rebuttal, but really it’s just an obvious statement when you stop to think about it.
Your code is fully capable of the output I want, assuming that’s one of “heads” or “tails”, so yes that’s a succinct example of what I said. As I said, knowing the required input might not be easy, but we KNOW it’s possible to do exactly what I want and we KNOW that it’s entirely dependent on me putting the right input into it, then it’s just a flat out silly thing to say “I’m not getting the output I want, but it could do it if I use the right input, thusly input has nothing to do with it.” What? If I wanted all heads I’d need to figure out “hamburgers” would do it, but that’s the ‘input problem’ - not “input is irrelevant.”
4 replies →
This article, (https://michaelmangialardi.substack.com/p/the-celestial-mirr...), came to similar conclusions as the parent article, and includes some tests (e.g. https://colab.research.google.com/drive/1kTqyoYpTcbvaz8tiYgj...) showing that LLMs, while good at understanding, fail at intellectual reasoning. The fact that they often produce correct outputs has more to do with training and pattern recognition than ability to grasp necessity and abstract universals.
They neither understand nor reason. They don’t know what they’re going to say, they only know what has just been said.
Language models don’t output a response, they output a single token. We’ll use token==word shorthand:
When you ask “What is the capital of France?” it actually only outputs: “The”
That’s it. Truly, that IS the final output. It is literally a one-way algorithm that outputs a single word. It has no knowledge, memory, and it’s doesn’t know what’s next. As far as the algorithm is concerned it’s done! It outputs ONE token for any given input.
Now, if you start over and put in “What is the capital of France? The” it’ll output “ “. That’s it. Between your two inputs were a million others, none of them have a plan for the conversation, it’s just one token out for whatever input.
But if you start over yet again and put in “What is the capital of France? The “ it’ll output “capital”. That’s it. You see where this is going?
Then someone uttered the words that have built and destroyed empires: “what if I automate this?” And so it was that the output was piped directly back into the input, probably using AutoHotKey. But oh no, it just kept adding one word at a time until it ran of memory. The technology got stuck there for a while, until someone thought “how about we train it so that <DONE> is an increasingly likely output the longer the loop goes on? Then, when it eventually says <DONE>, we’ll stop pumping it back into the input and send it to the user.” Booya, a trillion dollars for everyone but them.
It’s truly so remarkable that it gets me stuck in an infinite philosophical loop in my own head, but seeing how it works the idea of ‘think’, ‘reason’, ‘understand’ or any of those words becomes silly. It’s amazing for entirely different reasons.
3 replies →
isn't intellectual reasoning just pattern recognition + a forward causal token generation mechanism?
13 replies →
But that's only true if the system is deterministic?
And in an LLM, the size of the inputs is vast and often hidden from the prompter. It is not something that you have exact control over in the way that you have exact control over the inputs that go into a calculator or into a compiler.
a system that copies its input into the output is capable of any output, no?
That would depend - is the input also capable of anything? If it’s capable of handling any input, and as you said the output will match it, the yes of course it’s capable of any output.
I’m not pulling a fast one here, I’m sure you’d chuckle if you took a moment to rethink your question. “If I had a perfect replicator that could replicate anything, does that mean it can output anything?” Well…yes. Derp-de-derp? ;)
It aligns with my point too. If you had a perfect replicator that can replicate anything, and you know that to be true, then if you weren’t getting gold bars out of it you wouldn’t say “this has nothing to do with the input.”
1 reply →
This was a great article. The section “Training for the next state prediction” explains a solution using subagents. If I’m understanding it correctly, we could test if that solution is directionally correct today, right? I ask a LLM a question. It comes up with a few potential responses but sends those first to other agents in a prompt with the minimum required context. Those subagents can even do this recursively a few times. Eventually the original agent collects and analyzes subagents responses and responds to me.
Any attempt at world modeling using today's LLMs needs to have a goal function for the LLM to optimize. The LLM needs to build, evaluate and update it's model of the world. Personally, the main obstacle I found is in updating the model: Data can be large and I think that LLMs aren't good at finding correlations.
Isn't that just RL with extra power-intensive steps? (An entire model chugging away in the goal function)
2 replies →
And I think you basically just described the OpenAI approach to building models and serving them.
Makes the same mistake as all other prognostications: programming is not like chess. Chess is a finite & closed domain w/ finitely many rules. The same is not true for programming b/c the domain of programs is not finitely axiomatizable like chess. There is also no win condition in programming, there are lots of interesting programs that do not have a clear cut specification (games being one obvious category).
> Makes the same mistake as all other prognostications: programming is not like chess. Chess is a finite & closed domain w/ finitely many rules. The same is not true for programming b/c the domain of programs is not finitely axiomatizable like chess.
I believe the author addresses it in the article:
> many domains are chess-like in their technical core but become poker-like in their operational context.
Also applicable to programming.
Programming has parts like chess that are bounded and what people assume to be actual work. However, what LLMs don't do well is understanding future requirements, stakeholder incenctives, etc.
Poker is also finite & closed. Comparing programming to finite games is a category error. You can ask any chat bot to explain why the analogy fails: https://chatgpt.com/s/t_698a25c0849c81918bdd5cfc400e70d1, https://chat.qwen.ai/s/t_09ccef1c-b1e1-4872-86ca-54d4972797e...
Are people really using AI just to write a slack message??
Also, Priya is in the same "world" as everyone else. They have the context that the new person is 3 weeks in and must probably need some help because they're new, are actually reaching out, and impressions matter, even if they said "not urgent". "Not urgent" seldom is taken at face value. It doesn't necessarily mean it's urgent, but it means "I need help, but I'm being polite".
People are pretending AIs are their boyfriends & girlfriends. Slack messages are the least bizarre use case.
Not that far off from all the tech CEOs who have projected they're one step away from giving us Star Trek TNG, they just need all the money and privilege with no accountability to make it happen
DevOps engineers who acted like the memes changed everything! The cloud will save us!
Until recently the US was quite religious; 80%+ around 2000 down to 60%s now. Longtermism dogma of one kind or another rules those brains; endless growth in economics, longtermism. Those ideal are baked into biochemical loops regardless of the semantics the body may express them in.
Unfortunately for all the disciples time is not linear. No center to the universe means no single epoch to measure from. Humans have different birthdays and are influenced by information along different timelines.
A whole lot of brains are struggling with the realization they were bought into a meme and physics never really cared about their goals. The next generation isn't going to just pick up the meme-baton validate the elders dogma.
6 replies →
They use it for emails, so why not use it for Slack messages as well?
Call me old fashioned, but I'm still sending DMs and emails using my brain.
1 reply →
This article is a really good summary of current thinking on the “world model” conundrum that a lot of people are talking about, either directly or indirectly with respect to current day deployments of LLMs.
It synthesizes comments on “RL Environments” (https://ankitmaloo.com/rl-env/), “World Models” (https://ankitmaloo.com/world-models/) and the real reason that the “Google Game Arena” (https://blog.google/innovation-and-ai/models-and-research/go...) is so important to powering LLMs. In a sense it also relates to the notion of “taste” (https://wangcong.org/2026-01-13-personal-taste-is-the-moat.h...) and how / if it’s moat-worthiness can be eliminated by models.
I think it's correct to say that LLM have word models, and given words are correlated with the world, they also have degenerate world models, just with lots of inconsistencies and holes. Tokenization issues aside, LLMs will likely also have some limitations due to this. Multimodality should address many of these holes.
(editor here) yes, a central nuance i try to communicate is not that LLMs cannot have world models (and in fact they've improved a lot) - it is just that they are doing this so inefficiently as to be impractical for scaling - we'd have to scale them up to so many more trillions of parameters more whereas our human brains are capable of very good multiplayer adversarial world models on 20W of power and 100T neurons.
I agree LLMs are inefficient, but I don't think they are as inefficient as you imply. Human brains use a lot less power sure, but they're also a lot slower and worse at parallelism. An LLM can write an essay in a few minutes that would take a human days. If you aggregate all the power used by the human you're looking at kWh, much higher than the LLM used (an order of magnitude higher or more). And this doesn't even consider batch parallelism, which can further reduce power use per request.
But I do think that there is further underlying structure that can be exploited. A lot of recent work on geometric and latent interpretations of reasoning, geometric approaches to accelerate grokking, and as linear replacements for attention are promising directions, and multimodal training will further improve semantic synthesis.
It's also important to handle cases where the word patterns (or token patterns, rather) have a negative correlation with the patterns in reality. There are some domains where the majority of content on the internet is actually just wrong, or where different approaches lead to contradictory conclusions.
E.g. syllogistic arguments based on linguistic semantics can lead you deeply astray if you those arguments don't properly measure and quantify at each step.
I ran into this in a somewhat trivial case recently, trying to get ChatGPT to tell me if washing mushrooms ever really actually matters practically in cooking (anyone who cooks and has tested knows, in fact, a quick wash has basically no impact ever for any conceivable cooking method, except if you wash e.g. after cutting and are immediately serving them raw).
Until I forced it to cite respectable sources, it just repeated the usual (false) advice about not washing (i.e. most of the training data is wrong and repeats a myth), and it even gave absolute nonsense arguments about water percentages and thermal energy required for evaporating even small amounts of surface water as pushback (i.e. using theory that just isn't relevant when you actually properly quantify). It also made up stuff about surface moisture interfering with breading (when all competent breading has a dredging step that actually won't work if the surface is bone dry anyway...), and only after a lot of prompts and demands to only make claims supported by reputable sources, did it finally find McGee's and Kenji Lopez's actual empirical tests showing that it just doesn't matter practically.
So because the training data is utterly polluted for cooking, and since it has no ACTUAL understanding or model of how things in cooking actually work, and since physics and chemistry are actually not very useful when it comes to the messy reality of cooking, LLMs really fail quite horribly at producing useful info for cooking.
So you think that enough of the complexity of the universe we live in is faithfully represented in the products of language and culture?
People won’t even admit their sexual desires to themselves and yet they keep shaping the world. Can ChatGPT access that information somehow?
The amount of faith a person has in LLMs getting us to e.g. AGI is a good implicit test of how much a person (incorrectly) thinks most thinking is linguistic (and to some degree, conscious).
Or at least, this is the case if we mean LLM in the classic sense, where the "language" in the middle L refers to natural language. Also note GP carefully mentioned the importance of multimodality, which, if you include e.g. images, audio, and video in this, starts to look like much closer to the majority of the same kinds of inputs humans learn from. LLMs can't go too far, for sure, but VLMs could conceivably go much, much farther.
2 replies →
> So you think that enough of the complexity of the universe we live in is faithfully represented in the products of language and culture?
Math is language, and we've modelled a lot of the universe with math. I think there's still a lot of synthesis needed to bridge visual, auditory and linguistic modalities though.
> you think that enough of the complexity of the universe we live in is faithfully represented in the products of language and culture?
Absolutely. There is only one model that can consistently produce novel sentences that aren't absurd, and that is a world model.
> People won’t even admit their sexual desires to themselves and yet they keep shaping the world
How do you know about other people's sexual desires then, if not through language? (excluding a very limited first hand experience)
> Can ChatGPT access that information somehow?
Sure. Just like any other information. The system makes a prediction. If the prediction does not use sexual desires as a factor, it's more likely to be wrong. Backpropagation deals with it.
Basically the conclusion is LLMs don't have world models. For work that's basically done on a screen, you can make world models. Harder for other context for example visual context.
For a screen (coding, writing emails, updating docs) -> you can create world models with episodic memories that can be used as background context before making a new move (action). Many professions rely partially on email or phone (voice) so LLMs can be trained for world models in these context. Just not every context.
The key is giving episodic memory to agents with visual context about the screen and conversation context. Multiple episodes of similar context can be used to make the next move. That's what I'm building on.
That's missing a big chunk of the post: it's not just about visible / invisible information, but also the game theory dynamics of a specific problem and the information within it. (Adversarial or not? Perfect information or asymmetrical?)
All the additional information in the world isn't going to help an LLM-based AI conceal its poker-betting strategy, because it fundamentally has no concept of its adversarial opponent's mind, past echoes written in word form.
Cliche allegory of the cave, but LLM vs world is about switching from training artificial intelligence on shadows to the objects casting the shadows.
Sure, you have more data on shadows in trainable form, but it's an open question on whether you can reliably materialize a useful concept of the object from enough shadows. (Likely yes for some problems, no for others)
I do understand what you're saying, but that's impossible to resonate with real-world context, as in the real world, each person not only plays politics but also, to a degree, follows their own internal world model for self-reflection created by experience. It's highly specific and constrained to the context each person experiences.
1 reply →
Game theory, at the end of the day, is also a form of teaching points that can be added to an LLM by an expert. You're cloning the expert's decision process by showing past decisions taken in a similar context. This is very specific but still has value in a business context.
1 reply →
> The model can be prompted to talk about competitive dynamics. It can produce text that sounds like adversarial reasoning. But the underlying knowledge is not in the training data. It’s in outcomes that were never written down.
With all the social science research and strategy books that LLMs have read, they actually know a LOT about outcomes and dynamics in adversarial situations.
The author does have a point though that LLMs can’t learn these from their human-in-the-loop reinforcement (which is too controlled or simplified to be meaningful).
Also, I suspect the _word_ models of LLMs are not inherently the problem, they are just inefficient representations of world models.
LLM's have not "read" social science research and they do not "know" about the outcomes, they have been trained to replicate the exact text of social science articles.
The articles will not be mutually consistent, and what output the LLM produces will therefore depend on what article the prompt most resembles in vector space and which numbers the RNG happens to produce on any particular prompt.
« Connaître est reconnaître »
I don’t think essentialist explanations about how LLMs work are very helpful. It doesn’t give any meaningful explanation of the high level nature of the pattern matching that LLMs are capable of. And it draws a dichotomic line between basic pattern matching and knowledge and reasoning, when it is much more complex than that.
1 reply →
editor here! all questions welcome - this is a topic i've been pursuing in the podcast for much of the past year... links inside.
I found it to be an interesting angle but thought it was odd that a key point is is "LLMs dominate chess-like domains" while LLMs are not great at chess https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13...
i mean, right there in the top update:
> UPD September 15, 2025: Reasoning models opened a new chapter in Chess performance, the most recent models, such as GPT-5, can play reasonable chess, even beating an average chess.com player.
1 reply →
Hey! Thanks for the thought provoking read.
It’s a limitation LLMs will have for some time. Being multi-turn with long range consequences the only way to truly learn and play “the game” is to experience significant amounts of it. Embody an adversarial lawyer, a software engineer trying to get projects through a giant org..
My suspicion is agents can’t play as equals until they start to act as full participants - very sci fi indeed..
Putting non-humans into the game can’t help but change it in new ways - people already decry slop and that’s only humans acting in subordination to agents. Full agents - with all the uncertainty about intentions - will turn skepticism up to 11.
“Who’s playing at what” is and always was a social phenomenon, much larger than any multi turn interaction, so adding non-human agents looks like today’s game, just intensified. There are ever-evolving ways to prove your intentions & human-ness and that will remain true. Those who don’t keep up will continue to risk getting tricked - for example by scammers using deepfakes. But the evolution will speed up and the protocols to become trustworthy get more complex..
Except in cultures where getting wasted is part of doing business. AI will have it tough there :)
My Sunday morning speculation is that LLMs, and sufficiently complex neural nets in general, are a kind of Frankenstein phenomenon, they are heavily statistical, yet also partly, subtly doing novel computational and cognitive-like processes (such as world models). To dismiss either aspect is a false binary; the scientific question is distinguishing which part of an LLM is which, which by our current level of scientific understanding is virtually like trying to ask when is an electron a wave or a particle.
[dead]
@dang is this allowed?
2 replies →
> AlphaGo or AlphaZero didn’t need to model human cognition. It needed to see the current state and calculate the optimal path better than any human could.
I don't think this is right: To calculate the optimal path, you do need to model human cognition.
At least, in the sense that finding the best path requires figuring out human concepts like "is the king vulnerable", "material value", "rook activity", etc. We have actual evidence of AlphaZero calculating those things in a way that is at least somewhat like humans do:
https://arxiv.org/abs/2111.09259
So even chess has "hidden state" in a significant sense: you can't play well without calculating those values, which are far from the surface.
I'm not sure there's a clear line between chess and poker like the author assumes.
(author here) great paper to cite.
What i think you are referring to is hidden state as in internal representations. I refer to hidden state in game theoretic terms like a private information only one party has. I think we both agree alphazero has hidden states in first sense.
Concepts like king safety are objectively useful for winning at chess so alphazero developed it too, no wonder about that. Great example of convergence. However, alphazero did not need to know what i am thinking or how i play to beat me. In poker, you must model a player's private cards and beliefs.
I see now, thanks. Yes, in poker you need more of a mental model of the other side.
a machine cannot understand the smell of a flower until it smells one..
I'm not sure? A person blind from birth has an understanding of color -- though it's a bit complicated: https://news.harvard.edu/gazette/story/2019/02/making-sense-...
People forget that evolution has almost certainly hard-coded certain concepts and knowledge deep into our brains. That deep knowledge will probably not be easy to translate into language, and probably isn't linguistic either, but we know it has to be there for at least some things.
emm.. i will read this. intresting.
Ten years ago it seemed obvious where the next AI breakthrough was coming from: it would be DeepSeek using C31 or RAINBOW and PBT to do Alpha something, the evals would be sound and it would be superhuman on something important.
And then "Large Language Models are Few Shot Learners" collided with Sam Altman's ambition/unscrupulousness and now TensorRT-LLM is dictating the shape of data centers in a self reinforcing loop.
LLMs are interesting and useful but the tail is wagging the dog because of path-dependent corruption arbitraging a fragile governance model. You can get a model trained on text corpora to balance nested delimiters via paged attention if you're willing to sell enough bonds, but you could also just do the parse with a PDA from the 60s and use the FLOPs for something useful.
We had it right: dial in an ever-growing set of tasks, opportunistically unify on durable generalities, put in the work.
Instead we asserted generality, lied about the numbers, and lit a trillion dollars on fire.
We've clearly got new capabilities, it's not a total write off, but God damn was this an expensive ways to spend five years making two years of progress.
> lit a trillion dollars on fire
Not all bad then. Hopefully this impedes them getting more.
Ok I have a question, if adversarial game theory helps neural nets learn world models then why can't logic help. After all the former is just a special case of the latter.
Great article, capturing some really important distinctions and successes/failures.
I've found ChatGPT, especially "5.2 Thinking" to be very helpful in the relatively static world of fabrication. CNC cutting parameters for a new material? Gets me right in the neighborhood in minutes (not perfect, but good parameters to start). Identifying materials to compliment something I have to work with? Again, like a smart assistant. Same for generating lists of items I might be missing in prepping for a meeting or proposal.
But the high-level attorney in the family? Awful, and definitely in the ways identified (the biglaw firm is using MS derivative of OpenAI) - it thinks only statically.
BUT, it is also far worse than that for legal. And this is not a problem of dynamic vs. static or world model vs word model.
This problem is the ancient rule of Garbage In Garbage Out.
In any legal specialty there are a small set of top-level experts, and a horde of low-level pretenders who also hang out their shingle in the same field. Worse yet, the pretenders write a LOT of articles about the field to market themselves as experts. These self-published documents look good enough to non-lawyers to bring in business. But they are often deeply and even catastrophically wrong.
The problem is that LLMs ingest ALL of them with credulity, and LLM's cannot or do not tell the difference. So, when an LLM composes something, it is more likely to lie to you or fabricate some weird triangulation as it is to compose a good answer. And, unless you are an EXPERT lawyer, you will not be able to tell the difference until it is far too late and the flaw has already bitten you.
It is only one of the problems, and it's great to have an article that so clearly identifies it.
Great article. Executive summary: chocolate teapot is just about able to hold cold water.
so at the moment combination of expert and llm is the smartest move. llm can deal with 80% of the situations which are like chess and expert deals with 20% of situations which are like poker.
> what survives contact with a self-interested opponent?
In the work environment the optimal strategy will be parameterised culturally.
Companies have different cultures - both at the company level and at the country level.
In some places self-interest is the accepted driving force, in others if you behave like that you'll find yourself quietly ostracised.
For example, I'm not sure Trump understands this.
I’m always wary of anything that has such a clear example of a case that LLMs “don’t do” yet is trivially achieved by saying “review”.
The slack message result for example saying they’re the lead designer but nothing else (with clearer and better feedback if I say they’re notoriously overloaded, this is without that)
This is a very polite and respectful start, which is great since you are new. However, from a professional standpoint, it is a little too passive and vague. In a busy work environment, saying "no rush at all" or "whenever" often leads to your request being buried at the bottom of a to-do list. Additionally, you haven't told Priya exactly what she is looking at or where to find it. Here is a breakdown of how to strengthen this message to show you are organized and respectful of her time without being pushy. Key Improvements Needed * Include the Link: Never make someone ask "Where are the files?" Always drop the Figma/Drive link immediately. * Be Specific: "Any feedback" is mentally taxing. Direct her attention to what you are actually unsure about (e.g., "the color palette," "the user flow," "the spacing"). * Soft Deadline: Instead of "whenever," give a target. This helps her prioritize her week. Option 1: The Balanced Approach (Recommended) This version strikes the perfect balance between politeness and productivity. > "Hi Priya! I’ve put together the initial mockups for [Project Name]. > [Link to file] > When you have a moment, I’d love your feedback on [specific element, e.g., the layout of the landing page]. I’m hoping to wrap this stage up by [Day/Time], but let me know what works for your schedule. Thanks!" > Option 2: The "Respectful of Time" Approach Use this if you know she is swamped, but you still want to make progress. > "Hi Priya, hope you're having a good week. I've drafted the mockups for [Project Name] here: [Link]. > No major rush, but I’d appreciate your eyes on the [specific section] before I move on to high-fidelity. If it’s easier, feel free to leave comments directly in the file rather than messaging back. Thanks!" > Option 3: The "Quick Check" Approach Use this if you just need a sanity check before spending more time on it. > "Hi Priya, could I get a quick gut check on these mockups? [Link]. > I want to make sure I’m aligning with the new brand guidelines before I build out the rest of the screens. Does this direction look right to you?" > A Quick Checklist Before You Send * [ ] Did you change the permissions? Make sure the link is accessible (view/comment access) so she doesn't have to request access. * [ ] Is the file clean? Delete your "scratchpad" artboards or clearly label the one you want her to review so she doesn't look at the wrong version. Would you like me to help you draft the specific sentence regarding the "specific element" you want her to critique?
> Humans can model the LLM. The LLM can’t model being modeled
Can’t they? Why not?
I see claims like this so often, which amount to the idea that LLMs lack metacognition. (Thinking about their thinking / self-refkection). Of course the obvious solution is: ask them to do that -- they're shockingly good at it!
Large embedding model
The problem will always be the training data. We can have LLMs because we have the web.
Can we get to another level without a corresponding massive training set that demonstrates those abilities?
I would say IMO results demonstrated that. Silver was tiny 3B model.
All of our theorem provers had no way to approach silver medal performance despite decades of algorithmic leaps.
Learning stage for transformers has a while ago demonstrated some insanely good distributed jumps into good areas of combinatorial structures. Inference is just much faster than inference of algorithms that aren’t heavily informed by data.
It’s just a fully different distributed algorithm where we can’t probably even extract one working piece without breaking the performance of the whole.
World/word model is just not the case there. Gradient descent obviously landed to a distributed representation of an algorithm that does search.
Llame Word Models.
[flagged]
"Eschew flamebait. Avoid generic tangents."
https://news.ycombinator.com/newsguidelines.html
Not sure about that, I'd more say the Western reductionism here is the assumption that all thinking / modeling is primarily linguistic and conscious. This article is NOT clearly falling into this trap.
A more "Eastern" perspective might recognize that much deep knowledge cannot be encoded linguistically ("The Tao that can be spoken is not the eternal Tao", etc.), and there is more broad recognition of the importance of unconscious processes and change (or at least more skepticism of the conscious mind). Freud was the first real major challenge to some of this stuff in the West, but nowadays it is more common than not for people to dismiss the idea that unconscious stuff might be far more important than the small amount of things we happen to notice in the conscious mind.
The (obviously false) assumptions about the importance of conscious linguistic modeling are what lead to people say (obviously false) things like "How do you know your thinking isn't actually just like LLM reasoning?".
All models have multimodality now, it's not just text, in that sense they are not "just linguistic".
Regarding conscious vs non-conscious processes:
Inference is actually non-conscious process because nothing is observed by the model.
Auto regression is conscious process because model observes its own output, ie it has self-referential access.
Ie models use both and early/mid layers perform highly abstracted non-conscious processes.
3 replies →
>"Westerners are trying so hard to prove that there is nothing special about humans."
I am not really fond of us "westerners", but judjing how many "easterners" treat their populace they seem to confirm the point
Read a boring book.
Or the opposite, that humans are somehow super special and not as simple as a prediction feedback loop with randomizations.
How do you manage to get that from the article?
Not from the article. Comments don't have to work this way.
you realize ankit is from india and i'm from singapore right lol
another "noahpinion"
The article basically claims that LLMs are bad at politics and poker which is both not true (at least if they receive some level of reinforcement learning after sweep training)
Top LLMs are still very bad at poker, see this breakdown of a recent Kaggle experiment: <https://www.youtube.com/watch?v=jyv1bv7JKIQ>
What do you mean by sweep training here?
> The finance friend and the LLM made the same mistake: they evaluated the text without modelling the world it would land in.
Major error. The LLM made that text without evaluating it at all. It just parrotted words it previously saw humans use in superficially similar word contexts.
I think this debate is mis-aimed. Both sides are right about different things, and wrong in the same way.
The mistake is treating “model” as a single property, instead of separating cognition from decision.
LLMs clearly do more than surface-level word association. They encode stable relational structure: entities, roles, temporal order, causal regularities, social dynamics, counterfactuals. Language itself is a compressed record of world structure, and models trained on enough of it inevitably internalize a lot of that structure. Calling this “just a word model” undersells what’s actually happening internally.
At the same time, critics are right that these systems lack autonomous grounding. They don’t perceive, act, or test hypotheses against reality on their own. Corrections come from training data, tools, or humans. Treating their internal coherence as if it were direct access to reality is a category error.
But here’s the part both sides usually miss: the real risk isn’t representational depth, it’s authority.
There’s a difference between:
cognition: exploring possibilities, tracking constraints, simulating implications, holding multiple interpretations; and
decision: collapsing that space into a single claim about what is, what matters, or what someone thinks.
LLMs are quite good at the first. They are not inherently entitled to the second.
Most failures people worry about don’t come from models lacking structure. They come from models (or users) quietly treating cognition as decision:
coherence as truth,
explanation as diagnosis,
simulation as fact,
“this sounds right” as “this is settled.”
That’s why “world model” language is dangerous if it’s taken to imply authority. It subtly licenses conclusions the system isn’t grounded or authorized to make—about reality, about causation, or about a user’s intent or error.
A cleaner way to state the situation is:
> These systems build rich internal representations that are often world-relevant, but they do not have autonomous authority to turn those representations into claims without external grounding or explicit human commitment.
Under that framing:
The “word model” camp is right to worry about overconfidence and false grounding.
The “world model” camp is right that the internal structure is far richer than token statistics.
They’re arguing about different failure modes, but using the same overloaded word.
Once you separate cognition from decision, the debate mostly dissolves. The important question stops being “does it understand the world?” and becomes “when, and under what conditions, should its outputs be treated as authoritative?”
That’s where the real safety and reliability issues actually live.