Comment by Alifatisk

6 days ago

> Beats Kimi K2.5 and GLM 4.7 on more benchmarks than it loses to them.

Does this really mean anything? I for example, tend to ignore certain benchmarks that are focused towards agentic tasks because that is not my use case. Instruction following, long context reasoning and non-hallucinations has more weight to me.