← Back to context

Comment by cco

1 day ago

You're surprised? I think harnesses are almost as important as the underlying model. Folks have been able to improve benchmark results by nearly 2x based on harness alone.

Harnesses are quickly becoming critical components of the "model" itself imo. Not shocking to me at all that a company that spots a revenue opportunity is keeping its harness closed source.

I'm a neophyte. What makes a harness special or all that unique from another? I've had a reasonable experience with Zed and local models, but could be persuaded to put something else in the mix if there is a measurable benefit to be had.

  • Simple example: a while back LLMs would trip over questions like "how many Rs are in strawberry". Now, the system prompts have a line like "when a user asks for a count, actually count the value by calling a tool if needed". The LLMs cannot get smarter in this regard, next token predictors will hallucinate here.

    A harness is that covering every blind spot or sub-optimal but probable output people have hit in the wild, and a lot of problems just have better solutions if you say "break problem A into subproblem B and subproblem C, then solve".

Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex