← Back to context

Comment by satvikpendem

1 year ago

Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.

11 comments

satvikpendem

Reply

mentos 1 year ago

Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?

satvikpendem 1 year ago
There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.
- mentos 1 year ago
  
  The raw text of the persons message can/will be posted to the forum and be obvious to the community if it’s a prompt injection to be flagged for human review and their account banned.
  
  8 replies →