Comment by lcnPylGDnU4H9OF
6 months ago
This is important context given that it would be absurd for the managers to have already drawn a definitive conclusion about the models’ capabilities. An explicit understanding that the purpose of the exercise is to get a better idea of the current strengths and weaknesses of the models in a “real world” context makes this actually very reasonable.
So why in public, and why in the most ham-fisted way, and why on important infrastructure, and why in such a terrible integration that it can't even verify that things compile before opening a PR!
In my org, we would have had to bypass precommit hooks to do this!