Comment by blizdiddy

5 days ago

Gpt-oss is natively 4-bit, so you kinda can

You can fit the weights + a tiny context window into 24GB, absolutely. But you can't fit anything of any reasonable size. Or Ollama's implementation is broken, but it needs to be restricted beyond usability for it not to freeze up the entire machine when I last tried to use it.