Comment by verdverm

16 days ago

Why is this interesting?

Is it a shade of gray from HN's new rule yesterday?

https://www.nytimes.com/video/world/middleeast/1000000107698...

18 comments

verdverm

I think it's because the LLM asked for permission, was given a "no", and implemented it anyway. The LLM's "justifications" (if you were to consider an LLM having rational thought like a human being, which I don't, hence the quotes) are in plain text to see.

I found the justifications here interesting, at least.

antdke 16 days ago

Well, imagine this was controlling a weapon.

“Should I eliminate the target?”

“no”

“Got it! Taking aim and firing now.”

bigstrat2003 16 days ago
It is completely irresponsible to give an LLM direct access to a system. That was true before and remains true now. And unfortunately, that didn't stop people before and it still won't.
- unselect5917 16 days ago
  
  And yet it's only a matter of time before someone does it. If they haven't already.
nielsole 16 days ago

Shall I open the pod bay doors?
verdverm 16 days ago
That's why we keep humans in the loop. I've seen stuff like this all the time. It's not unusual thinking text, hence the lack of interestingness
- marbletiles 16 days ago
  
  The human in the loop here said “no”, though. Not sure where you’d expect another layer of HITL to resolve this.
  
  2 replies →
nvch 16 days ago

"Thinking: the user recognizes that it's impossible to guarantee elimination. Therefore, I can fulfill all initial requirements and proceed with striking it."

nielsole 16 days ago

Opus being a frontier model and this being a superficial failure of the model. As other comments point out this is more of a harness issue, as the model lays out.

verdverm 16 days ago

Exactly, the words you give it affect the output. You can get hem to say anything, so I find this rather dull

bakugo 16 days ago

It's interesting because of the stark contrast against the claims you often see right here on HN about how Opus is literally AGI

verdverm 16 days ago

I see that daily, seeing someone else's is not enlightening. Maybe this is a come back to reality moment for others?

Swizec 16 days ago

Because the operator told the computer not to do something so the computer decided to do it. This is a huge security flaw in these newfangled AI-driven systems.

Imagine if this was a "launch nukes" agent instead of a "write code" agent.

verdverm 16 days ago

It's not interesting because this is what they do, all the time, and why you don't give them weapons or other important things.
They aren't smart, they aren't rationale, they cannot reliably follow instructions, which is why we add more turtles to the stack. Sharing and reading agent thinking text is boring.
I had one go off on e one time, worse than the clawd bot who wrote that nasty blog after being rejected on GitHub. Did I share that session? No, because it's boring. I have 100s of these failed sessions, they are only interesting in aggregate for evals, which is why is save them.

mmanfrin 16 days ago

How is this not clear?

verdverm 16 days ago

I seen this pattern so often, it's dull. They will do all sorts of stupid things, this is no different.