Naive question, but isn’t every output token generated in roughly the same, non-deterministic, way? Even if it uses its actual history as context, couldn’t the output still be incorrect?
Have you ever seen those posts where AI image generation tools completely fail to generate an image of the leaning tower of Pisa straightened out? Every single time, they generate the leaning tower, well… leaning. (With the exception of some more recent advanced models, of course)
From my understanding, this is because modern AI models are basically pattern extrapolation machines. Humans are too, by the way. If every time you eat a particular kind of berry, you crap your guts out, you’re probably going to avoid that berry.
That is to say, LLMs are trained to give you the most likely text (their response) which follows some preceding text (the context). From my experience, if the LLM agent loads a history of commands run into context, and one of those commands is a deletion command, the subsequent text is almost always “there was a deletion.” Which makes sense!
So while yes, it is theoretically possible for things to go sideways and for it to hallucinate in some weird way (which grows increasingly likely if there’s a lot of junk clogging the context window), in this case I get the impression it’s close to impossible to get a faulty response. But close to impossible ≠ impossible, so precautions are still essential.
Yes, but Claude Cowork isn't just an LLM. It's a sophisticated harness wrapped around the LLM (Opus 4.5, for example). The harness does a ton of work to keep the number of tokens sent and received low, as well as the context preserved between calls low. This applies to other coding agents to varying extents as well.
Asking for the trace is likely to involve the LLM just telling the harness to call some tools. Such as calling the Bash tool with grep to find the line numbers in the trace file for the command. It can do this repeatedly until the LLM thinks it found the right block. Then those line numbers are passed to the Read tool (by the harness) to get the command(s), and finally the output of that read is added to the response by the harness.
The LLM doesn't get a chance to reinterpret or hallucinate until it says it is very sorry for what happened. Also, when it originally wrote (hallucinated?) the commands was when it made an oopsy.
Naive question, but isn’t every output token generated in roughly the same, non-deterministic, way? Even if it uses its actual history as context, couldn’t the output still be incorrect?
Not trolling, asking as a regular user
Have you ever seen those posts where AI image generation tools completely fail to generate an image of the leaning tower of Pisa straightened out? Every single time, they generate the leaning tower, well… leaning. (With the exception of some more recent advanced models, of course)
From my understanding, this is because modern AI models are basically pattern extrapolation machines. Humans are too, by the way. If every time you eat a particular kind of berry, you crap your guts out, you’re probably going to avoid that berry.
That is to say, LLMs are trained to give you the most likely text (their response) which follows some preceding text (the context). From my experience, if the LLM agent loads a history of commands run into context, and one of those commands is a deletion command, the subsequent text is almost always “there was a deletion.” Which makes sense!
So while yes, it is theoretically possible for things to go sideways and for it to hallucinate in some weird way (which grows increasingly likely if there’s a lot of junk clogging the context window), in this case I get the impression it’s close to impossible to get a faulty response. But close to impossible ≠ impossible, so precautions are still essential.
Yes, but Claude Cowork isn't just an LLM. It's a sophisticated harness wrapped around the LLM (Opus 4.5, for example). The harness does a ton of work to keep the number of tokens sent and received low, as well as the context preserved between calls low. This applies to other coding agents to varying extents as well.
Asking for the trace is likely to involve the LLM just telling the harness to call some tools. Such as calling the Bash tool with grep to find the line numbers in the trace file for the command. It can do this repeatedly until the LLM thinks it found the right block. Then those line numbers are passed to the Read tool (by the harness) to get the command(s), and finally the output of that read is added to the response by the harness.
The LLM doesn't get a chance to reinterpret or hallucinate until it says it is very sorry for what happened. Also, when it originally wrote (hallucinated?) the commands was when it made an oopsy.