Comment by kristopolous
8 hours ago
I have a tool to track these I've built
Relatively speaking here's where it's at:
score age size name
44.2 97 large GLM-5 (Reasoning)
44.7 187 - GPT-5.1 (high)
44.9 29 - Qwen3.6 Max Preview
45 0 - Gemini 3.5 Flash
45.5 27 large MiMo-V2.5-Pro
45.6 75 - GPT-5.4 (low)
this is from artificial-analysis using https://github.com/day50-dev/aa-eval-email/blob/main/art-ana...
I really don't know why people down vote me. What do I need to say to make things for free that people like? Sincere question. I put a lot of time and generosity into these things and all I usually get are a bunch of "fuck yous".
This is honestly an existential issue for me. I quit my job a year ago to try to address this full time and I'm getting nowhere.
Buddy, this tone may be why.
We genuinely don't understand what your post is about. What is this tool? What are these numbers representative? Why are things sorted in that order?
You haven't communicated really anything at all. I am interested, I'd like to understand. Write a more complete post, please.
Are you familiar with https://artificialanalysis.ai/leaderboards/models
The json on the page has a coding index result it hides from the table.
That's what this exposes. It's a sorting from the leading evals company on the coding index for basically every model that matters presented in an easy to parse format that you can feed into model routing harnesses in real time so, for instance, your agents can dynamically upgrade themselves to better models as they come out or cost optimize based on eval results.
I do stuff like this, give it away for free and it's either ignored or makes people angry...
I really wish I didn't piss people off with my sincerity but somehow it always goes down that way
I really appreciate your time thank you so much
I see no 'score' or 'age' mentioned in your script. What does age signify and how are they calculated?
This isn't obvious?
Real question. I see 86400 and I know it's time... That might just be me.
I'm not being an ass, I don't know how to talk to people or when I think I'm being clear but I'm actually being cryptic
It is kind of noisy because the release recency, which is what your "age" column actually represents, is not important data for the comparison you are trying to make.
Also what message we should get from that table is not really obvious.
1 reply →