Comment by lcnPylGDnU4H9OF

6 months ago

This is important context given that it would be absurd for the managers to have already drawn a definitive conclusion about the models’ capabilities. An explicit understanding that the purpose of the exercise is to get a better idea of the current strengths and weaknesses of the models in a “real world” context makes this actually very reasonable.

So why in public, and why in the most ham-fisted way, and why on important infrastructure, and why in such a terrible integration that it can't even verify that things compile before opening a PR!

In my org, we would have had to bypass precommit hooks to do this!