Comment by StrauXX
4 hours ago
A lot of comments here are dismissing this post because the relevant code was isolated. But thats the exact same thing Anthropic did with Mythos! They describe their (very lean) harness in the Anthropic Red Mythos blog post. The harness first assigns each file in the given codebase an importance value. Then points claude code at the cpdebase with a prompt stating that it should focus on that file. It spawns a claude code instances for each file in the codebase.
So no, the fact that the posters isolated the relevant code does not invalidate their findings.
From the article:
> Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior").
check other comments, they didn't