Comment by fluoridation
7 hours ago
That's slower than just running it off CPU+GPU. I can easily hit 1.5 tokens/s on a 7950X+3090 and a 20480-token context.
7 hours ago
That's slower than just running it off CPU+GPU. I can easily hit 1.5 tokens/s on a 7950X+3090 and a 20480-token context.
No comments yet
Contribute on Hacker News ↗