Comment by ffsm8

1 month ago

> There's no way for third parties to be sure of anything in this kind of situation,

Not just third parties, but also the first party can't be sure of anything - just as he said. This entire article was speculation because there was no other way to figure out what could've caused the ban.

> where one party shares only the information they wish and the other side stays silent as a matter of default corporate policy.

I don't think that's a fair viewpoint - because it implies that relevant information was omitted on purpose.

From my own experience with anthropic, I believe his story is likely true.

I mean they were terminating sessions left an right all summer/fall because of "violations"... Like literally writing "hello" in a clean project and first prompt and getting the session terminated.

This has since been mostly resolved, but I bet there are still edge cases on their janky "safety" measures. And looking at the linked claude.md, his theory checks out to me. I mean he was essentially doing what was banned in the TOS - iteratively finding ways to lead the model to doing something else them what it initially was going to do.

If his end goal was to write a malware which does, essentially, prompt injection... He'd go at it exactly like this. Hence sure as hell can imagine anthropic writing a prompt to analyze sessions determining bad actors which caught him

0 comments

ffsm8

No comments yet

Contribute on Hacker News ↗