Comment by yunwal

7 months ago

Luckily there’s a new set of problems every year

2 comments

yunwal

You can really only do a fair reproducible test if the models are static and not sitting behind an api where you have no idea on how they are updated or continuously tweaked.

chvid 7 months ago

This particular test is heralded as some sort of breakthrough and the companies in this field are raising billions of dollars from investors and paying their star employees tens of millions.
The economic incentives to tweak, tune, or cheat are through the roof.