← Back to context

Comment by Martin_Silenus

2 days ago

I did not ask about what AI can do.

> Is it part of the multi-modal system without it being able to differenciate that text from the prompt?

Yes.

The point the parent is making is that if your model is trained to understand the content of an image, then that's what it does.

> And even if they can't, they should at least improve the pipeline so that any OCR feature should not automatically inject its result in the prompt, and tell user about it to ask for confirmation.

That's not what is happening.

The model is taking <image binary> as an input. There is no OCR. It is understanding the image, decoding the text in it and acting on it in a single step.

There is no place in the 1-step pipeline to prevent this.

...and sure, you can try to avoid it procedural way (eg. try to OCR an image and reject it before it hits the model if it has text in it), but then you're playing the prompt injection game... put the words in a QR code. Put them in french. Make it a sign. Dial the contrast up or down. Put it on a t-shirt.

It's very difficult to solve this.

> It's hard to believe they can't prevent this.

Believe it.

  • Now that makes more sense.

    And after all, I'm not surprised. When I read their long research PDFs, often finishing with a question mark about emerging behaviors, I knew they don't know what they are playing with, with no more control than any neuroscience researcher.

    This is too far from hacking spirit to me, sorry to bother.