Comment by pjc50

3 months ago

If someone's going to ask you gotcha questions which they're then going to post on social media to use against you, or against other people, it helps to have pre-prepared statements to defuse that.

The model may not be able to detect bad faith questions, but the operators can.

5 comments

pjc50

pmichaud 3 months ago

I think the concern is that if the system is susceptible to this sort of manipulation, then when it’s inevitably put in charge of life critical systems it will hurt people.

mrguyorama 3 months ago

The system IS susceptible to all sorts of crazy games, the system IS fundamentally flawed from the get go, the system IS NOT to be trusted.
putting it in charge of life critical systems is the mistake, regardless of whether it's willing to say slurs or not
pjc50 3 months ago
There is no way it's reliable enough to be put in charge of life-critical systems anyway? It is indeed still very vulnerable to manipulation by users ("prompt injection").
- ben_w 3 months ago
  
  Just because neither you nor I would deem it safe to put in charge of a life-critical system, does not mean all the people in charge of life-critical systems are as cautious and not-lazy as they're supposed to be.
- klaff 3 months ago
  
  https://www.businessinsider.com/even-top-generals-are-lookin...