Comment by empressplay

8 days ago

This article was a good read, but the writer doesn't seem to understand how model-based image generation actually works, using language that suggests the image is somehow progressively constructed the way a human would do it. Which is absurd.

I've noticed a lot of this misinformation floating around lately, and I can't help but wonder if it's intentional?

I'm not sure what you're implying is incorrect/misleading. As noted in the post, autoregressive models like Nano Banana and gpt-image-1 generate by token (and each generated token attends to all previous tokens, both text and image) which are then decoded, while diffusion models generate the entire image simultaneously, refined over n iteration steps.