Comment by StrauXX

2 months ago

A lot of comments here are dismissing this post because the relevant code was isolated. But thats the exact same thing Anthropic did with Mythos! They describe their (very lean) harness in the Anthropic Red Mythos blog post. The harness first assigns each file in the given codebase an importance value. Then points claude code at the cpdebase with a prompt stating that it should focus on that file. It spawns a claude code instances for each file in the codebase.

So no, the fact that the posters isolated the relevant code does not invalidate their findings.

[1] https://red.anthropic.com/2026/mythos-preview/

3 comments

StrauXX

felipeerias 2 months ago

From the article:

> Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior").

grandinquistor 2 months ago

I mean you can still scale that? Ask a lighter model to go through every function to find vulnerabilities, take output to bigger model like Opus and classify the critical ones.

make_it_sure 2 months ago

check other comments, they didn't