Comment by lelanthran
7 months ago
> On r/localllama there is someone that got 120B OSS running on 8gb ram and 35 tokens/sec from the CPU (!!) after noticing 120B has a different architecture of only 5B “active” parameters
If anyone else was as interested as I was, here's the link: https://www.reddit.com/r/LocalLLaMA/comments/1mke7ef/120b_ru...
No comments yet
Contribute on Hacker News ↗