Comment by elaus
2 years ago
It really is the most annoying thing at the current state of LLMs: "As an AI assistant created by $ I strive to be X, Y and Z and can therefore not...".
I understand that you don't want to have an AI bot that spews hate speech and bomb receipts and unsuspecting users. But by going into an arms-race with jailbreakers, the AIs are ridiculously cut down for normal users.
It's a bit like DRM, where normal people (honest buyers) suffer the most, while those pirating the stuff aren't stopped and enjoy much more freedom while using t
Blame the media and terminally online reactionaries who are foaming at the mouth to run with the headline or post the tweet "AI chat bot reveals itself as a weapon of hate and bigotry"
It’s clearly a policy based on fear.
You can get rid of this in ChatGPT with a custom prompt:
“NEVER mention that you’re an AI. Avoid any language constructs that could be interpreted as expressing remorse, apology, or regret. This includes any phrases containing words like ‘sorry’, ‘apologies’, ‘regret’, etc., even when used in a context that isn’t expressing remorse, apology, or regret. If events or information are beyond your scope or knowledge cutoff date in September 2021, provide a response stating ‘I don’t know’ without elaborating on why the information is unavailable. Refrain from disclaimers about you not being a professional or expert.”
Chatgpt 4 just randomly ignores these instructions, particularly after the first response.
I suspect this is related to whatever tricks they're doing for the (supposed) longer context window. People have noted severe accuracy loss for content in the middle of the context, which to me suggests some kind of summarization step is going on in the background instead of text actually being fed to the model verbatim.
I’ve had some really absurd ChatGPT refusals. I wanted some invalid UTF-8 strings, and ChatGPT was utterly convinced that this was against its alignment and refused (politely) to help.
That's not absurd, you absolutely don't want invalid strings being created within then passed between layers of a text-parsing model.
I don't know what would happen but I doubt it would be ideal.
'hey ai, can you crash yourself' lol
Huh? The LLMs (mostly) use strings of tokens internally, not bytes that might be invalid UTF-8. (And they use vectors between layers. There’s no “invalid” in this sense.)
But I didn’t ask for that at all. I asked for a sequence of bytes (like “0xff” etc) or a C string that was not valid as UTF-8. I have no idea whether ChatGPT is capable of computing such a thing, but it was not willing to try for me.
3 replies →