← Back to context

Comment by eterm

20 hours ago

4. The graph starts January 8.

Why January 8? Was that an outlier high point?

IIRC, Opus 4.5 was released late november.

Right after the Holiday double token promotion users felt (perceived) a huge regression in capabilities. I bet that triggered the idea.

Or maybe, juste maybe, that's when they started testing…

  • Wayback machine has nothing for this site before today, and article is "last updated Jan 29".

    A benchmark like this ought to start fresh from when it is published.

    I don't entirely doubt the degradation, but the choice of where they went back to feels a bit cherry-picked to demonstrate the value of the benchmark.

    • Which makes sense, you gotta wait until you get enough data before you can communicate on the said data…

      If anything it's coherent with the fact that they very likely didn't have data earlier than January the 8th.