Comment by meisel
4 hours ago
I wonder if their "5.3" was continuously being updated, with regenerated benchmarks with each improvement, and they just stayed ready to release it when claude released
4 hours ago
I wonder if their "5.3" was continuously being updated, with regenerated benchmarks with each improvement, and they just stayed ready to release it when claude released
This seems plausible. It would be shocking if these companies didn't have an automated testing suite which is recomputing these benchmarks on a regular basis, and uploading to a dashboard for supervision.
Given that they already pre-approved various language and marketing materials beforehand there's no real reason they couldn't just leave it lined up with a function call to go live once the key players make the call.