Comment by interleave

2 months ago

Hey Tanya! Thank you for helping me understand the results better.

I just posted the results of another basic interview analysis (4o vs. Llama4) here: https://x.com/SpringStreetNYC/status/1923774145633849780

To your point: Do I understand correctly that, for example, by running the default model of Llama4 via ollama, the context window is very short even when the model's context is, like 10M. In order to "unlock" the full context version, I need to get the unquantized version.

For reference, here's what `ollama show llama4` returns: - parameters 108.6B # llama4:scount - context length 10485760 # 10M - embedding length 5120 - quantization Q4_K_M