Comment by PunchyHamster
1 month ago
Putting effort into preventing jailbreaks seems like a waste, it's clearly what people want to use your product for, why annoy customers instead of providing the option in the first place ?
Also I'm curious what's the "demon" data point with a bunch of ones that have positive connotation
There will be people who want to experiment, but there's no particular reason why a company that intends to offer a helpful assistant needs to serve them. They can go try Character.ai or something.
ChatGPT is miserable if your input data involves any kind of reporting on crime. It'll reject even "summarize this article" requests if the content is too icky. Not a very helpful assistant.
I hear the API is more liberal but I haven't tried it.
A company that intends to offer a helpful assistant might find that the "assistant character" of an LLM is not adequate for being a helpful assistant.
To support GP‘s point: I have Claude connected to a database and wanted it to drop a table.
Claude is trained to refuse this, despite the scenario being completely safe since I own both parts! I think this is the “LLMs should just do what the user says” perspective.
Of course this breaks down when you have an adversarial relationship between LLM operator and person interacting with it (though arguably there is no safe way to support this scenario due to jailbreak concerns).
Some of the customers are mentally unwell and are unable to handle an LLM telling them it's sentient.
At this point it's pretty clear that the main risk of LLMs to any one individual are that they'll encourage them to kill themselves and the individual might listen.