Comment by mips_avatar

8 days ago

From the model card: "the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning" aka they will take your ML research code and inject bugs into it until it breaks using a LORA (or some other form of PEFT)

12 comments

mips_avatar

sciencejerk 8 days ago

Are they trying to fight back against model distillation?

bee_rider 8 days ago

“Limit effectiveness” could mean introducing performance degradation in your code. Which is arguably some sort of performance bug (I mean, ML codes are supposed to be high performance so I’d call unnecessary degradation a bug), but it could be borderline.

rurban 8 days ago

No, it is just a prominent "Cyber Security threat detected" blocker, with a button to appeal. I appealed because my work had nothing to do with neither cyber nor security, but the appeal was auto-closed. So no more Claude for this work.

nomel 8 days ago

Thanks, I thought maybe I missed something. That's an interesting way to interpret that.

mips_avatar 8 days ago
Anthropic is trying to hide bad behavior by being vague, it's important to not be vague when calling it out.
- nomel 8 days ago
  
  I'm of the opinion that removing guardrails is how you force regulation. What's your opinion on the balance?
  
  4 replies →
giancarlostoro 8 days ago
PEFT is a library, one of its capabilities is to produce LoRAs.
See:
https://heidloff.net/article/efficient-fine-tuning-lora/
- adw 8 days ago
  
  It's just an acronym, "parameter-efficient fine tuning". LoRA is one method, prefix tuning is another, there are more.