Comment by jameson
6 hours ago
Why can't Claude Code generate effective harness for us by inspecting the code base?
I tried defining CLAUDE.md (or AGENTS.md), skills, plugins, but I'm not getting the effectiveness others claim to be. LSP plugin for example, CC doesn't to use LSP's symbol renaming and edits file one by one slowly, or it does not invoke the skill when I explicitly ask to remember to invoke when prompt contains a specific clue.
Am I using it wrong? Is there a robust example I can copy the harness?
This is the pain point that existed for years now and its still not solved at all.
"If A, do X. Do B,C,D. Do A" - and it just never uses X because "it forgot".
You just cant trust that the time you spend building rules will actually pay off, in fact you can trust that it will fail you sooner or later.
RAG, Harness, Skills... all was supposed to fix this, but in reality it never had.
Harnesses do fix it IMO - it’s why Claude code and Codex had a massive jump in alleged productivity on release and then seems to have flatlined. But a custom harness _would_ allow you to do things like “on every message, run lint validation and tests”. That in and of itself would be wildly useful.
The harnesses we have are almost stunningly incomplete IMHO. I've been trying `pi` recently, and quite like that it comes with a minimal set of tools by default -- and that I can easily override or replace the ones that it ships.
I've only just started working with it, but clamping `read/write/edit` to only allow editing files in the current directory, banning `bash` and mandating I write tools for the specific commands I want it to execute, has made me much happier. Running Claude inside a VM or similar to sandbox it is nuclear overkill; I've always been surprised that that's seemed like the state of the art.
With a better harness, the model can't choose to rename things with search and replace; if it wants to rename things, it _must_ call the LSP to do it. If it's going to write code, as you suggest, the harness _forces_ linting/formatting to run.
(Reading my own comment back, I am worried that the fucking AI writing style is infecting me :()
a colleague using OpenCode was telling me it has linting/formatting configurable at harness level and I can't see why this is in every harness
1 reply →
> Am I using it wrong?
I stopped using `/init` and having CLAUDE|AGENTS.md files that explained the codebase. The only thing I kept was how it should explore the codebase and use `git log` when researching, which is probably redundant too. I can't figure it out either.
The codebase I work on is roughly 100k LOC so idk if it is considered large. Personally it's the largest repo I have worked on.
What seems to work in some cases are hooks with scripts that feed into the context window (I've had to strip out some of the unnecessary linter messaging to limit context). Linters and/or other language specific checkers that can be installed via OS package repository and called via script. Also, the model + skill context together could make a difference. Skills that "worked" on 4.6 may not work as well on 4.7, which seems to require more explicit direction, but is more reliable by comparison to 4.6. Updating skills might help too. Test and run before/after to check. CC also injects unnecessary tool calls into context, so you may need to suppress tasks if you're a beads fan for example.
[flagged]
[flagged]