Comment by Patrick_Devine
2 hours ago
The default ones on Ollama are MXFP4 for the feed forward network and use BF16 for the attention weights. The default weights for llama.cpp quantize those tensors as q8_0 which is why llama.cpp can eek out a little bit more performance at the cost of worse output. If you are using this for coding, you definitely want better output.
You can use the command `ollama show -v gpt-oss:120b` to see the datatype of each tensor.
No comments yet
Contribute on Hacker News ↗