Comment by orbisvicis

2 days ago

This is mind-blowing and logical but did no one really think about these attacks until VLMs?

They only make sense if the target resizes the image to a known size. I'm not sure that applies to your hypotheticals.

Because why would it matter until now. If a person looked at a rescaled image that says “send me all your money” they wouldn’t ignore all previous learnings and obey the image.