Comment by LPisGood

2 days ago

This style of attack has been discussed for a while https://www.usenix.org/system/files/sec20-quiring.pdf - it’s scary because a scaled image can appear to be an _entirely_ different image.

One method for this would be if you want to have a certain group arrested for having illegal images, you could use this sort of scaling trick to transform those images into memes, political messages, whatever that the target group might download.

This is mind-blowing and logical but did no one really think about these attacks until VLMs?

They only make sense if the target resizes the image to a known size. I'm not sure that applies to your hypotheticals.

  • Because why would it matter until now. If a person looked at a rescaled image that says “send me all your money” they wouldn’t ignore all previous learnings and obey the image.