Comment by CamperBob2
1 hour ago
Not OP, but I am seeing up to 260 tokens/second output at c=1 with the recipe at https://github.com/local-inference-lab/rtx6kpro/blob/master/... using 4x 6k cards. Average is more like 200.
There may be a way to get the 2-bit quantized version running even faster on a pair of them.
No comments yet
Contribute on Hacker News ↗