Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by CamperBob2

1 hour ago

Not OP, but I am seeing up to 260 tokens/second output at c=1 with the recipe at https://github.com/local-inference-lab/rtx6kpro/blob/master/... using 4x 6k cards. Average is more like 200.

There may be a way to get the 2-bit quantized version running even faster on a pair of them.

0 comments

CamperBob2

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities