Comment by verdverm
16 days ago
Why is this interesting?
Is it a shade of gray from HN's new rule yesterday?
https://www.nytimes.com/video/world/middleeast/1000000107698...
16 days ago
Why is this interesting?
Is it a shade of gray from HN's new rule yesterday?
https://www.nytimes.com/video/world/middleeast/1000000107698...
I think it's because the LLM asked for permission, was given a "no", and implemented it anyway. The LLM's "justifications" (if you were to consider an LLM having rational thought like a human being, which I don't, hence the quotes) are in plain text to see.
I found the justifications here interesting, at least.
Well, imagine this was controlling a weapon.
“Should I eliminate the target?”
“no”
“Got it! Taking aim and firing now.”
It is completely irresponsible to give an LLM direct access to a system. That was true before and remains true now. And unfortunately, that didn't stop people before and it still won't.
And yet it's only a matter of time before someone does it. If they haven't already.
Shall I open the pod bay doors?
That's why we keep humans in the loop. I've seen stuff like this all the time. It's not unusual thinking text, hence the lack of interestingness
The human in the loop here said “no”, though. Not sure where you’d expect another layer of HITL to resolve this.
2 replies →
"Thinking: the user recognizes that it's impossible to guarantee elimination. Therefore, I can fulfill all initial requirements and proceed with striking it."
Opus being a frontier model and this being a superficial failure of the model. As other comments point out this is more of a harness issue, as the model lays out.
Exactly, the words you give it affect the output. You can get hem to say anything, so I find this rather dull
It's interesting because of the stark contrast against the claims you often see right here on HN about how Opus is literally AGI
I see that daily, seeing someone else's is not enlightening. Maybe this is a come back to reality moment for others?
Because the operator told the computer not to do something so the computer decided to do it. This is a huge security flaw in these newfangled AI-driven systems.
Imagine if this was a "launch nukes" agent instead of a "write code" agent.
It's not interesting because this is what they do, all the time, and why you don't give them weapons or other important things.
They aren't smart, they aren't rationale, they cannot reliably follow instructions, which is why we add more turtles to the stack. Sharing and reading agent thinking text is boring.
I had one go off on e one time, worse than the clawd bot who wrote that nasty blog after being rejected on GitHub. Did I share that session? No, because it's boring. I have 100s of these failed sessions, they are only interesting in aggregate for evals, which is why is save them.
How is this not clear?
I seen this pattern so often, it's dull. They will do all sorts of stupid things, this is no different.