Comment by badmonster

5 months ago

Why do LLMs struggle so much with recovering from early wrong turns in multi-turn conversations — even when all prior context is available and tokenized?

Is it due to the model's training distribution (mostly single-shot completions), the way context windows are encoded, or an architectural bottleneck?

Feels like there's no dynamic internal state that evolves over the conversation — only a repeated re-parsing of static history. Has anyone seen work on integrating memory/state mechanisms that allow belief revision within a session, not just regurgitation of past tokens?

4 comments

badmonster

JohnKemeny 5 months ago

We shouldn’t anthropomorphize LLMs—they don’t “struggle.” A better framing is: why is the most likely next token, given the prior context, one that reinforces the earlier wrong turn?

bandrami 5 months ago

Because Markov chains propagate forward in time

mountainriver 5 months ago

It’s a problem specific to autoregressive LLMs, the early tokens bias the output

vjerancrnjak 5 months ago

Imagine optimizing/training on a happy path.

When you generate future tokens, you're looking at history tokens that are happy.

So how can a model, given sad tokens, generate future happy tokens if it did not learn to do so?

The work you're looking for is already here, it's "thinking". I assume they include sad tokens in the dataset, produce "thinking", which should result in happy tokens coming after thinking tokens. If thinking is bad (by looking at following happy tokens), then it's punished, if good, then descent.