Comment by ffsm8
6 hours ago
> There's no way for third parties to be sure of anything in this kind of situation,
Not just third parties, but also the first party can't be sure of anything - just as he said. This entire article was speculation because there was no other way to figure out what could've caused the ban.
> where one party shares only the information they wish and the other side stays silent as a matter of default corporate policy.
I don't think that's a fair viewpoint - because it implies that relevant information was omitted on purpose.
From my own experience with anthropic, I believe his story is likely true.
I mean they were terminating sessions left an right all summer/fall because of "violations"... Like literally writing "hello" in a clean project and first prompt and getting the session terminated.
This has since been mostly resolved, but I bet there are still edge cases on their janky "safety" measures. And looking at the linked claude.md, his theory checks out to me. I mean he was essentially doing what was banned in the TOS - iteratively finding ways to lead the model to doing something else them what it initially was going to do.
If his end goal was to write a malware which does, essentially, prompt injection... He'd go at it exactly like this. Hence sure as hell can imagine anthropic writing a prompt to analyze sessions determining bad actors which caught him
No comments yet
Contribute on Hacker News ↗