Comment by satvikpendem
19 days ago
Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.
19 days ago
Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.
Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?
There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.
The raw text of the persons message can/will be posted to the forum and be obvious to the community if it’s a prompt injection to be flagged for human review and their account banned.
8 replies →