Comment by yorwba
6 days ago
In this context, "performance" means "does it do what we want it to do" not "does it do it quickly". Quality of output is what they're measuring, speed is not a consideration.
6 days ago
In this context, "performance" means "does it do what we want it to do" not "does it do it quickly". Quality of output is what they're measuring, speed is not a consideration.
The point is that whether it does what you tell it in a single iteration is less important then whether it avoids stupid mistakes. Any serious use will put it in a harness.
My point is that you misread the comment you replied to. (By the way, on page 2 of the paper: "we evaluate each LLM only within its corresponding harness.")
> My point is that you misread the comment you replied to.
I'm not the person you replied to.
> (By the way, on page 2 of the paper: "we evaluate each LLM only within its corresponding harness.")
That has zero relevance to my comment or to the type of harnesses I talked about in the comment you replied to, nor in my comment up-thread.
1 reply →