Comment by kelvie
3 days ago
gpt-oss-120b doesn't fit on a 5090 without offloading or crazy quants -- or did you mean you ran it via openrouter or something?
3 days ago
gpt-oss-120b doesn't fit on a 5090 without offloading or crazy quants -- or did you mean you ran it via openrouter or something?
I'm running the MXFP4 [0] quants at like 10-13 toks/sec. It is actually really good, I'm starting to think its a problem with Cline since I just tried it with Qwen3 and the same thing happened. Turns out Cline _hates_ empty files in my projects, although they aren't required for this to happen.
[0] https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-...
Sounds like a crazy quant. IME 2 bit quants are pretty dumb.