← Back to context

Comment by GaggiX

7 hours ago

Well with a standard autoregressive model you can generate for example 256 tokens at once if you have 256 users, with this approach you can generate 256 tokens for a single user but you need several forward steps.

So the diffusion process takes more GFLOPs, if you have enough users you can already balance memory and compute.