Comment by jszymborski

2 months ago

I just got a RTX 5090, so I thought I'd see what all the fuss was about these AI coding tools. I've previously copy pasted back and forth from Claude but never used the instruct models.

So I fired up Cline with gpt-oss-120b, asked it to tell me what a specific function does, and proceeded to watch it run `cat README.md` over and over again.

I'm sure it's better with other the Qwen Coder models, but it was a pretty funny first look.

3 comments

jszymborski

kelvie 2 months ago

gpt-oss-120b doesn't fit on a 5090 without offloading or crazy quants -- or did you mean you ran it via openrouter or something?

jszymborski 2 months ago

I'm running the MXFP4 [0] quants at like 10-13 toks/sec. It is actually really good, I'm starting to think its a problem with Cline since I just tried it with Qwen3 and the same thing happened. Turns out Cline _hates_ empty files in my projects, although they aren't required for this to happen.
[0] https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-...
kube-system 2 months ago

Sounds like a crazy quant. IME 2 bit quants are pretty dumb.