Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by vardalab

6 days ago

Q4 quants on 32G VRAM gives you 131K context for 35BA3B and 27B models who are pretty capable. On 5090 one gets 175 tg and ~7K pp with 35BA3B, 27B isaround 90 tg. So speed is awesome. Even Strix 395 gives 40 tk/s and 256K context. Pretty amazing, there is a reason people are excited about qwen 3.5

0 comments

vardalab

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities