Comment by vidarh

5 days ago

Without measuring quality of output, this seems irrelevant to me.

My use of CLAUDE.md is to get Claude to avoid making stupid mistakes that will require subsequent refactoring or cleanup passes.

Performance is not a consideration.

If anything, beyond CLAUDE.md I add agent harnesses that often increase the time and tokens used many times over, because my time is more expensive than the agents.

6 comments

vidarh

_joel 5 days ago

CLAUDE.md isn't a silver bullet either, I've had it lose context a couple of questions deep. I do like GSD[1] though, it's been a great addition to the stack. I also use multiple, different LLMs as a judge for PRs, which captures a load of issues too.

[1] https://github.com/gsd-build/get-shit-done

yorwba 5 days ago

In this context, "performance" means "does it do what we want it to do" not "does it do it quickly". Quality of output is what they're measuring, speed is not a consideration.

vidarh 5 days ago
The point is that whether it does what you tell it in a single iteration is less important then whether it avoids stupid mistakes. Any serious use will put it in a harness.
- yorwba 4 days ago
  
  My point is that you misread the comment you replied to. (By the way, on page 2 of the paper: "we evaluate each LLM only within its corresponding harness.")
  
  2 replies →