Comment by ben_w

1 year ago

It reads like you think failing tests can't ever be bad because they're in a test environment?

So it merely knows how to approach the task of deleting its own off-switch but didn't actually pass that command to a real execution environment.

That's already bad because people do sometimes blindly pass commands from the context windows to execution environments.

Should they? No, they should not. Not blindly. But they do.

3 comments

ben_w

z3c0 1 year ago

This isn't a test environment, it's a production scenario where a bunch of people trying to invent a new job for themselves role-played with an LLM. Their measured "defections" were an LLM replying with "well I'm defecting".

OpenAI wants us to see "5% of the time, our product was SkyNet", because that's sexier tech than "5% of the time, our product acts like the chaotic member of your DnD party".

SAI_Peregrinus 1 year ago
Or "5% of the time, our product actually manages to act as it was instructed to act."
- z3c0 1 year ago
  
  Bingo -- or with no marketing swing, "100% of the time, our product exhibits an approximation of human language, which is all it is ever going to do."