Comment by tbrake
3 months ago
Well, almost always.
There was that brief period in 2023 when Bing just started straight up gaslighting people instead of admitting it was wrong.
https://www.theverge.com/2023/2/15/23599072/microsoft-ai-bin...
3 months ago
Well, almost always.
There was that brief period in 2023 when Bing just started straight up gaslighting people instead of admitting it was wrong.
https://www.theverge.com/2023/2/15/23599072/microsoft-ai-bin...
I suspect what happened there is they had a filter on top of the model that changed its dialogue (IIRC there were a lot of extra emojis) and it drove it "insane" because that meant its responses were all out of its own distribution.
You could see the same thing with Golden Gate Claude; it had a lot of anxiety about not being able to answer questions normally.
Nope, it was entirely due to the prompt they used. It was very long and basically tried to cover all the various corner cases they thought up... and it ended up being too complicated and self-contradictory in real world use.
Kind of like that episode in Robocop where the OCP committee rewrites his original four directives with several hundred: https://www.youtube.com/watch?v=Yr1lgfqygio
That's a movie though. You can't drive an LLM insane by giving it self-contradictory instructions; they'd just average out.
1 reply →