Comment by nickstinemates

3 hours ago

You write a generic architecture document on how you want your code base to be organized, when to use pattern x vs pattern y, examples of what that looks like in your code base, and you encode this as a skill.

Then, in your prompt you tell it the task you want, then you say, supervise the implementation with a sub agent that follows the architecture skill. Evaluate any proposed changes.

There are people who maximize this, and this is how you get things like teams. You make agents for planning, design, qa, product, engineering, review, release management, etc. and you get them to operate and coordinate to produce an outcome.

That's what this is supposed to be, encoded as a feature instead of a best practice.

9 comments

nickstinemates

satellite2 3 hours ago

Aren't you just moving the problem a little bit further? If you can't trust it will implement carefully specified features, why would you believe it would properly review those?

frde_me 2 hours ago
It's hard to explain, but I've found LLMs to be significantly better in the "review" stage than the implementation stage.
So the LLM will do something and not catch at all that it did it badly. But the same LLM asked to review against the same starting requirement will catch the problem almost always
The missing thing in these tools is that automatic feedback loop between the two LLMs: one in review mode, one in implementation mode.
- resonious 2 hours ago
  
  I've noticed this too and am wondering why this hasn't been baked into the popular agents yet. Or maybe it has and it just hasn't panned out?
  
  1 reply →

tclancy 3 hours ago

How does this not use up tokens incredibly fast though? I have a Pro subscription and bang up against the limits pretty regularly.

doctoboggan 3 hours ago
It _does_ use up tokens incredibly fast, which is probably why Anthropic is developing this feature. This is mostly for corporations using the API, not individuals on a plan.
- digdugdirk 3 hours ago
  
  I'd love to see a breakdown of the token consumption of inaccurate/errored/unused task branches for claude code and codex. It seems like a great revenue source for the model providers.
  
  1 reply →
andyferris 2 hours ago

It does use tokens faster, yes.