Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by throwdbaaway

6 days ago

Using ik_llama.cpp to run a 27B 4bpw quant on a RTX 3090, I get 1312 tok/s PP and 40.7 tok/s TG at zero context, dropping to 1009 tok/s PP and 36.2 tok/s TG at 40960 context.

35B A3B is faster but didn't do too well in my limited testing.

1 comment

throwdbaaway

Reply

ranger_danger  6 days ago

with regular llama.cpp on a 3070ti I get 60tok/s TG with the 9B model, it's quite impressive.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities