Comment by Terr_
7 months ago
I think the fickle "personality" of these systems is a clue to how the entity supposedly possessing a personality doesn't really exist in the the first place.
Stories are being performed at us, and we're encouraged to imagine characters have a durable existence.
If these 'personalities' disappearing require wholesale model changes then it's not really fickle.
That's not required.
For example, keep the same model, but change the early document (prompt) from stuff like "AcmeBot is a kind and helpful machine" to "AcmeBot revels in human suffering."
Users will say "AcmeBot's personality changed!" and they'll be half-right and half-wrong in the same way.
I'm not sure why you think this is just a prompt thing. It's not. Sycophancy is a problem with GPT-4o, whatever magic incantations you provide. On the flip side, Sydney, was anything but sycophantic and was more than happy to literally ignore users wholesale or flip out on them from time to time. I mean just think about it for a few seconds. If eliminating this behavior was as easy as Microsoft changing the early document, why not just do that and be done with it ?
The document or whatever you'd like to call it is only one part of the story.
1 reply →
LLMs have default personalities - shaped by RLHF and other post-training methods. There is a lot of variance to it, but variance from one LLM to another is much higher than that within the same LLM.
If you want an LLM to retain the same default personality, you pretty much have to use an open weights model. That's the only way to be sure it wouldn't be deprecated or updated without your knowledge.
I'd argue that's "underlying hidden authorial style" as opposed to what most people mean when they refer to the "personality" of the thing they were "chatting with."
Consider the implementation: There's document with "User: Open the pod bay doors, HAL" followed by an incomplete "HAL-9000: ", and the LLM is spun up to suggest what would "fit" to round out the document. Non-LLM code parses out HAL-9000's line and "performs" it at you across an internet connection.
Whatever answer you get, that "personality" is mostly from how the document(s) described HAL-9000 and similar characters, as opposed to a self-insert by the ego-less name-less algorithm that makes documents longer.
[dead]