← Back to context

Comment by kristopolous

8 hours ago

I have a tool to track these I've built

Relatively speaking here's where it's at:

    score  age  size    name
    44.2   97   large   GLM-5 (Reasoning)
    44.7   187  -       GPT-5.1 (high)
    44.9   29   -       Qwen3.6 Max Preview
    45     0    -       Gemini 3.5 Flash
    45.5   27   large   MiMo-V2.5-Pro
    45.6   75   -       GPT-5.4 (low)

this is from artificial-analysis using https://github.com/day50-dev/aa-eval-email/blob/main/art-ana...

I really don't know why people down vote me. What do I need to say to make things for free that people like? Sincere question. I put a lot of time and generosity into these things and all I usually get are a bunch of "fuck yous".

This is honestly an existential issue for me. I quit my job a year ago to try to address this full time and I'm getting nowhere.

Buddy, this tone may be why.

We genuinely don't understand what your post is about. What is this tool? What are these numbers representative? Why are things sorted in that order?

You haven't communicated really anything at all. I am interested, I'd like to understand. Write a more complete post, please.

  • Are you familiar with https://artificialanalysis.ai/leaderboards/models

    The json on the page has a coding index result it hides from the table.

    That's what this exposes. It's a sorting from the leading evals company on the coding index for basically every model that matters presented in an easy to parse format that you can feed into model routing harnesses in real time so, for instance, your agents can dynamically upgrade themselves to better models as they come out or cost optimize based on eval results.

    I do stuff like this, give it away for free and it's either ignored or makes people angry...

    I really wish I didn't piss people off with my sincerity but somehow it always goes down that way

    I really appreciate your time thank you so much

I see no 'score' or 'age' mentioned in your script. What does age signify and how are they calculated?

  • This isn't obvious?

        "\(
            10 \* (.codingIndex // 0) | round / 10
        ) \(
          (
            now - (
            .releaseDate |
              try ( strptime("%Y-%m-%d") | mktime )
              catch (now + 86400)
          ) ) / 86400 | floor
    

    Real question. I see 86400 and I know it's time... That might just be me.

    I'm not being an ass, I don't know how to talk to people or when I think I'm being clear but I'm actually being cryptic

    • It is kind of noisy because the release recency, which is what your "age" column actually represents, is not important data for the comparison you are trying to make.

      Also what message we should get from that table is not really obvious.

      1 reply →