Comment by CamperBob2

1 hour ago

Not OP, but I am seeing up to 260 tokens/second output at c=1 with the recipe at https://github.com/local-inference-lab/rtx6kpro/blob/master/... using 4x 6k cards. Average is more like 200.

There may be a way to get the 2-bit quantized version running even faster on a pair of them.

0 comments

CamperBob2

No comments yet