Comment by Sammi

3 hours ago

It also has a DNF. So it has a high ceiling but also unfortunately a low floor. So using Kimi means accepting high variability of the output.

Personally what I've found that has made coding agents more and more useful over the last year is that they have gotten a higher and higher floor, not that they have gotten a higher and higher ceiling. They were already plenty smart a year ago, it was just that they failed so often and so spectacularly that it made them a liability. Now they have become much more reliable, which is the key thing that has transitioned them into being actually useful. For the most part I don't use them to work on really intellectually difficult tasks. I mostly use them to work on very boring and labor intensive tasks. Most commercial software development work is just boring drudgery like this. Certainly the bulk of what I need them for is. I need them to just not crap their pants all the time while they're at it.

So I'm kinda wary seeing the poor reliability of Kimi.

1 comment

Sammi

mpeg 1 hour ago

If you look at the last 5 challenges (the ones Kimi was in) both Claude and Kimi have 1 DNF, chatgpt has 2

I'm not sure this is enough data to form an opinion, but going by what we have Kimi would be as reliable as Claude