← Back to context

Comment by XCSme

1 month ago

General intelligence (not coding) comparison: https://aibenchy.com/compare/z-ai-glm-5-medium/z-ai-glm-5-1-...

6 comments

XCSme

Reply

BoorishBears 1 month ago

Is there really no rule that discourages 99% of your interactions with HN from being peddling some useless slop benchmark?

XCSme 1 month ago
If it's relevant to the discussion, I hope not.
I've spent probably over100 hours working on this benchmarking/site platform, and all tests are manually written. For me (and many others that reached out to me) are not useless either. I use this myself regularly when choosing and comparing new models. I honestly beleive it is providing value to the conversation.
Let me know if you know of a better platform you can use to compare models, I built this one because I didn't find any with good enough UX.
- jaggs 1 month ago
  
  It's a great benchmark. Don't listen to the haters. This one is especially interesting.
  https://aibenchy.com/compare/anthropic-claude-sonnet-4-6-med...
  
  3 replies →