Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by segmondy

1 day ago

use llama.cpp, you will be surprised how fast a model like qwen3.5-35b-a3b will run. that a3b means only 3B active parameter, so while infering the entire 3B will be in your GPU and you will get amazing performance. for your system, you should use the -cmoe option

0 comments

segmondy

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities