Comment by mips_avatar
9 days ago
They've said that they'll stop notifying developers when this gets triggered, instead they'll load in basically like a LORA that's designed to inject bugs into your code.
9 days ago
They've said that they'll stop notifying developers when this gets triggered, instead they'll load in basically like a LORA that's designed to inject bugs into your code.
Antrophic wants to stop training models and ride out Mythos / Fable for as long as possible.
They are trying to expand the 6-18 month gap they have against China-based models. Could the gap widen to say 24 months behind?
Their gap over Chinese models like GLM-5.1 is nowhere near 18 months. In many areas, it’s less than 6 months. The best closed models 18 months ago were worse than Qwen3.6.
These coding agent models only started getting useful in January. Before that they were difficult to control autocomplete, and not very smart.
January was an inflection point, and no open weights model has crossed over that same threshold.
This is definitely recursive self improvement territory, except that we're prohibited from participating.
It feels like the capability gap is wider than before.
3 replies →
> a LORA that's designed to inject bugs into your code
A statement like this, clearly, requires a reference.
From the model card: "the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning" aka they will take your ML research code and inject bugs into it until it breaks using a LORA (or some other form of PEFT)
Are they trying to fight back against model distillation?
“Limit effectiveness” could mean introducing performance degradation in your code. Which is arguably some sort of performance bug (I mean, ML codes are supposed to be high performance so I’d call unnecessary degradation a bug), but it could be borderline.
1 reply →
Thanks, I thought maybe I missed something. That's an interesting way to interpret that.
8 replies →