← Back to context

Comment by bhekanik

8 hours ago

Strongly agree with the patterns list, but I think the order matters: observability and evals should come before sophistication.

Teams jump straight to multi-agent setups because it feels "advanced," then spend weeks debugging ghosts. A boring single-agent flow with strict tool contracts, replayable traces, and a small regression suite beats a clever architecture you can't reason about.

My rule of thumb now: if I can't explain why an agent made a decision from logs alone, it's not production-ready yet.

I have come to the same conclusion. I'm thinking it more like "raising" these subagents via iterations and tuning until they are "grown-up" and basically become reliable. Thats why even though I can setup a team pretty easily via claude code, I don't see the benefit until the would be team members are reliable. Once the main subagents are solid, we can move on to build a team by pointing them to these subagents - atleast thats what I'm thinking in my one-step-at-time slow way. Most probably overcautious and maybe even wrong but if I'm seeing a subagent doing weird stuff across many executions, I can't build much in terms of layers on top of it.