Comment by gundmc
7 days ago
Well, their huge GPU clusters have "insane VRAM". Once you can actually load the model without offloading, inference isn't all that computationally expensive for the most part.
7 days ago
Well, their huge GPU clusters have "insane VRAM". Once you can actually load the model without offloading, inference isn't all that computationally expensive for the most part.
No comments yet
Contribute on Hacker News ↗