Comment by angusturner
2 months ago
One under appreciated / misunderstood aspect of these models is they use more compute than an equivalent sized autoregressive model.
It’s just that for N tokens, autoregressive model has to make N sequential steps.
Where diffusion does K x N, with the N being done in parallel. And for K << N.
This makes me wonder how well they will scale to many users, since batching requests would presumably saturate the accelerators much faster?
Although I guess it depends on the exact usage patterns.
Anyway, very cool demo nonetheless.
No comments yet
Contribute on Hacker News ↗