← Back to context

Comment by pjc50

3 months ago

If someone's going to ask you gotcha questions which they're then going to post on social media to use against you, or against other people, it helps to have pre-prepared statements to defuse that.

The model may not be able to detect bad faith questions, but the operators can.

I think the concern is that if the system is susceptible to this sort of manipulation, then when it’s inevitably put in charge of life critical systems it will hurt people.

  • The system IS susceptible to all sorts of crazy games, the system IS fundamentally flawed from the get go, the system IS NOT to be trusted.

    putting it in charge of life critical systems is the mistake, regardless of whether it's willing to say slurs or not

  • There is no way it's reliable enough to be put in charge of life-critical systems anyway? It is indeed still very vulnerable to manipulation by users ("prompt injection").