← Back to context

Comment by chrsw

18 hours ago

> run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this

What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally?

RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llama-server/llama.cpp and max context, as an example. Guess you could do it with two 5090s with slightly less context, or different software aimed at memory usage efficiency.

The model is 64GB (int4 native), add 20GB or so for context.

There are many platforms out there that can run it decently.

AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.