← Back to context

Comment by nu11ptr

12 hours ago

What hardware do you have it running on? Do you feel you could replace the frontier models with it for everyday coding? Would/will you?

Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+

I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes.

60 to 70 on a 5080, but only tinkering for now. The smaller models seem exceptionally good for what they are, and some can even do OCR reliably.

I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context.

> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?

Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.