Comment by morsecodist
1 day ago
This matches my experience exactly. "poisoned" is a great way to put it. I find once something has gone wrong all subsequent responses are bad. This is why I am iffy on ChatGPT's memory features. I don't notice it causing any huge problems but I don't love how it pollutes my context in ways I don't fully understand.
It's interesting how much the nature of LLMs fundamentally being self recursive next token predictors aligns with the Chinese Room experiment. [1] In such experiment it also makes perfect sense that a single wrong response would cascade into a series of subsequent ever more drifting errors. I think it all emphasizes the relevance of the otherwise unqualifiable concept of 'understanding.'
In many ways this issue could make the Chinese Room thought experiment even more compelling. Because it's a very practical and inescapable issue.
[1] - https://en.wikipedia.org/wiki/Chinese_room
I don't think the Chinese room thought experiment is about this, or performance of LLMs in general. Searle explicitly argues that a program can't induce "understanding" even if it mimicked human understanding perfectly because programs don't have "causal powers" to generate "mental states".
This is mentioned in the Wikipedia page too: "Although its proponents originally presented the argument in reaction to statements of artificial intelligence (AI) researchers, it is not an argument against the goals of mainstream AI research because it does not show a limit in the amount of intelligent behavior a machine can display."
Great comment on the Chinese room. That idea seems to be dismissed nowadays but the concept of “cascading failure to understand context” is absolutely relevant to LLMs. I often find myself needing to explain basic details over and over again to an LLM; when with a person it would be a five second, “no, I mean like this way, not that way” explanation.
I find using tools like LMStudio, which lets you edit your chat history on the fly, really helps deal with this problem. The models you can host locally are much weaker, but they perform a little better than the really big models once you need to factor in these poisoning problems.
A nice middle-ground I'm finding is to ask Claude an initial conversation starter in its "thinking" mode, and then copy/paste that conversation into LMStudio and have a weaker model like Gemma pick-up from where Claude left off.
I have very limited experience with llms but i've always thought of it as a compounding errors problem, once you get a small error early on it can compound and go completely off track later.
good point on the memory feature. Wow that sounds terrible
The memory is easy to turn off. It sounded like a very bad idea to cross-contaminate chats so I disabled it as soon as ChatGPT introduced it.