I like to think of it like the difference between dropping a ball on a roulette wheel (get one random number/sequence of repeated) - vs dropping a ball on a carved topographic map, where valleys guide the ball to a particular outcome.
If you can stand a little AI expansion - here are a few points Gemini came up with - I think the idea has some merit:
I don't have global CLAUDE.md and the only non-default skill I have that was used here is the one to use rodney[0] headless browser. I didn't expressly tell Claude to do browser testing, it decided to do it on its own.
Thanks for sharing this. Going to try it out on a game inspired by Rust. It's helpful re: the point on rodney - I've had a hard time getting the testing to work well in the browser.
It's a combination of reasoning effort (max) + enabling workflow that orchestrates multiple sub-agents.
After some interrogation, here's how it organized the work:
1. Design workflow (rts-game-design, 11 agents, ~13 min) ran first, produced SPEC.md + DESIGN.md:
1.1. Proposals (3 parallel agents): each designed a complete RTS from a different philosophy
1.2 Judge (1 agent): evaluated all three and synthesized one unified design, committing to specific numbers (costs, HP, map size, etc.).
1.3 Deep-dives (6 parallel agents): each wrote an implementation-ready spec for one subsystem, all consistent with the chosen design
1.4 Synthesis (1 agent): merged the design + all six subsystem specs into one conflict-free master spec
2. Code-review workflow (rts-code-review, 25 agents, ~5 min), ran after the main agent had written and tested the code:
2.1 Review (6 agents, read-only Explore type): each scrutinized one dimension and returned structured findings.
2.2. Verify (19 agents): every finding got its own skeptic agent told to try to refute it, Result: 19 flagged → 16 confirmed, 3 rejected as non-bugs.
What the main agent did in the main loop:
- Wrote all ~2,400 lines of index.html by hand from the spec.
- All browser testing/debugging via headless Chrome (I told it to use rodney by @simonw, love the tool :)
- Applied all 16 fixes from the review and re-verified them in the browser.
seems like a rube-goldberg esque way to consume 10x tokens. is this really where the industry is heading?
I like to think of it like the difference between dropping a ball on a roulette wheel (get one random number/sequence of repeated) - vs dropping a ball on a carved topographic map, where valleys guide the ball to a particular outcome.
If you can stand a little AI expansion - here are a few points Gemini came up with - I think the idea has some merit:
https://g.co/gemini/share/b5b97867eeb1
(Maybe the better analogy is roulette vs pinball machine)
Why is it Rube Goldbergesque? The process doesn't seem arbitrary.
3 replies →
Just to confirm - you did not generate this plan/orchestration/harness - it did all that on its own?
Correct, that's the "workflows" part they introduced in claude code alongside the new model.
Did you start with a clean slate or do you have global ~/.claude/CLAUDE.md and/or specific skills, plugins, etc?
I don't have global CLAUDE.md and the only non-default skill I have that was used here is the one to use rodney[0] headless browser. I didn't expressly tell Claude to do browser testing, it decided to do it on its own.
So no extra guidance beyond the prompt.
[0] https://github.com/simonw/rodney/
Thanks for sharing this. Going to try it out on a game inspired by Rust. It's helpful re: the point on rodney - I've had a hard time getting the testing to work well in the browser.
it's a brand new mode
Biases the model to solve problems with teams of agents