Comment by citrusx

3 days ago

They're spinning this as a positive learning experience, and trying to make themselves look good. But, make no mistake, this was a failure on Anthropic's part to prevent this kind of abuse from being possible through their systems in the first place. They shouldn't be earning any dap from this.

6 comments

citrusx

vessenes 3 days ago

They don't have to disclose any of this - this was a fairly good and fair overview of a system fault in my opinion.

NitpickLawyer 3 days ago

Meh, drama aside, I'm actually curious what would be the true capabilities of a system that doesn't go through any "safety" alignment at all. Like an all out "mil-spec" agent. Feed it everything, RL it to own boxes, and let it loose in an air-gapped network to see what the true capabilities are.

We know alignment hurts model performance (oAI people have said it, MS people have said it). We also know that companies train models on their own code (google had a blog about it recently). I'd bet good money project0 has something like this in their sights.

I don't think we're that far from a blue vs. red agents fighting and RLing off of each-other in a loop.

joshellington 3 days ago

I assume this is already happening. Incompetence within state actor systems being the only hurdle. The incentive and geopolitic implications is too high to NOT do it.
I just pray incompetence wins in the right way, for humanity’s sake.
pixl97 3 days ago

Cyberpunk has a reoccurring theme of advanced AI systems attacking and defending against each other, and for good reason.
wmf 3 days ago

Nous claims to be doing that but I haven't seen much discussion of it.