Comment by naklitechie

14 hours ago

Looks like it's about a year behind. Not that I am complaining. A year behind is good progress.

I also feel much of the trick is in the reasoning and harness.

so some progress around that would accelerate this process.

5 comments

naklitechie

Harness certainly matters a lot, though GLM is pretty forgiving. I just had Opus tell me that based on numbers over the last week, from quite a few billion tokens total across half a dozen providers, GLM 5.1 has been more reliable for one of my projects than Sonnet... Just switching on 5.2 now.

amosjyng 8 hours ago
How are you collecting your metrics on token usage and reliability?
- vidarh 7 hours ago
  
  They are from my own runs, with reliability measured in terms of passing extensive test suites. So caveat is that this applies for my specific use and might well vary greatly.

pseudony 12 hours ago

And what do you base this on ?

How does one objectively quantify how it stacks upnto another model ?

Or even, what is your subjective evaluation based on ?

I really wonder - because I have just finished a fully vibe-coded gtk/rust/lua application with me basically writing 7% of the code (all in one module) and GLM 5.1 writing the rest. We haven’t had regressions, confusion or anything else. And I am pretty damned sure I couldn’t manage this one year ago with claude code and Sonnet.

lejalv 6 hours ago

What harness, if you don't mind sharing?