Comment by irthomasthomas
8 days ago
If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.
8 days ago
If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.
OpenAI has only provided MXFP4 weights. These are the same weights used by other cloud providers.
Oh, I didn't know that. Weird!
It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).
2 replies →