Comment by djsjajah
1 day ago
> Do you really though?
Yes.
It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register. For every token that is generated, a dense llm has to read every parameter in the model.
No comments yet
Contribute on Hacker News ↗