Comment by magicalhippo

1 year ago

Playing with local llama vision and minicpm-v models, they do seem resistant to what one might call blatant prompt injection. Ie just inserting one of the classic "ignore previous instructions" or similar.

So yeah, would be curious how susceptible they are to more refined approaches. Are there some known examples?