Comment by cbolton

7 months ago

You can bypass the system prompt by using the API? I thought part of the "safety" of LLMs was implemented with the system prompt. Does that mean it's easier to get unsafe answers by using the API instead of the GUI?

2 comments

cbolton

minimaxir 7 months ago

Safety is both the system prompt and the RLHF posttraining to refuse to answer adversarial inputs.

pegasus 7 months ago

Yes, it is.