Comment by minimaxir

8 days ago

I'm not sure what you're implying is incorrect/misleading. As noted in the post, autoregressive models like Nano Banana and gpt-image-1 generate by token (and each generated token attends to all previous tokens, both text and image) which are then decoded, while diffusion models generate the entire image simultaneously, refined over n iteration steps.