Comment by z3c0

7 months ago

This isn't a test environment, it's a production scenario where a bunch of people trying to invent a new job for themselves role-played with an LLM. Their measured "defections" were an LLM replying with "well I'm defecting".

OpenAI wants us to see "5% of the time, our product was SkyNet", because that's sexier tech than "5% of the time, our product acts like the chaotic member of your DnD party".

Or "5% of the time, our product actually manages to act as it was instructed to act."

  • Bingo -- or with no marketing swing, "100% of the time, our product exhibits an approximation of human language, which is all it is ever going to do."