Comment by XCSme
13 hours ago
In my tests[0] it does only slightly better than Kimi K2.5.
Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.
It is probably a great coding model, but a bit less intelligent overall than SOTAs
[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...
I tried it on openrouter and set max tokens to 8192, and every response is truncated, even in non-thinking mode. Maybe there's an issue with the deployment, but in your link also shows it generates tons of output tokens.
Oh yeah, I just noticed, like 3x the reasoning tokens.