Comment by trq_

6 months ago

Yes, we do but harnesses are hard to eval, people use them across a huge variety of tasks and sometimes different behaviors tradeoff against each other. We have added some evals to catch this one in particular.

3 comments

trq_

amelius 6 months ago

Can't you keep the model the same, until the user chooses to use a different model?

rovr138 6 months ago

He said it was the harness, not the model though.

hu3 6 months ago

Thank you. Fair enough