Comment by Alifatisk
6 days ago
> Beats Kimi K2.5 and GLM 4.7 on more benchmarks than it loses to them.
Does this really mean anything? I for example, tend to ignore certain benchmarks that are focused towards agentic tasks because that is not my use case. Instruction following, long context reasoning and non-hallucinations has more weight to me.
No comments yet
Contribute on Hacker News ↗