← Back to context

Comment by gtirloni

5 days ago

I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself. These frameworks are great for fire-and-forget tasks, especially when there is some research involved but they burn 10x more tokens, in my experience. I was always hitting the Max plan limits for no discernable benefit in the outcomes I was getting. But this will vary a lot depending on how people prefer to work.

I ended up grafting the brainstorm, design, and implementation planning skills from Superpowers onto a Ralph-based implementation layer that doesn't ask for my input once the implementation plan is complete. I have to run it in a Docker sandbox because of the dangerously set permissions but that is probably a good idea anyway.

It's working, and I'm enjoying how productive it is, but it feels like a step on a journey rather than the actual destination. I'm looking forward to seeing where this journey ends up.

  • I find simple Ralph loops with an implementer and a reviewer that repeat until everything passes review and unit tests is 90% of the job.

    I would love to do something more sophisticated but it's ironic that when I played both agents in this loop over the past few decades, the loop got faster and faster as computers got faster and faster. Now I'm back to waiting on agentic loops just like I used to wait for compilations on large code bases.

  • If it is working, why is it just a step on a journey? What is missing?

    • It's a kludged-together dev process made up of two different systems in a docker container so potential damage is contained. It's not ideal ;)

      Neither of those two systems feel evolved either. Superpowers is very cool, but there are holes still. And Ralph feels like an experiment that worked so they published it.

      This is all going somewhere, evolving and moving towards some beautiful system. Or maybe the usual dev ecosystem shit - it'll be a great prototype and then it'll get overthought, overcomplicated and overengineered and end up less usable than what we had before *glares at React*

  • did you hand modify the superpowers skills or are you managing this some other way?

    • For me, I just created my own prompt pipeline, with a nod towards GANs all of the necessary permissions get surfaced so I don't need to babysit it, and all are relatively simple. No need for Yolo or Dangerously setting Permissions.

    • yeah, I coped the skills I wanted into a directory, hacked away at them until they did what I wanted, and then added them to the dockerfile for the sandbox

I've gone the other way recently, shifting from pure plan mode to superpowers. I was reminded of it due to the announcement of the latest version.

It is perhaps confirmation bias on my part but I've been finding it's doing a better job with similar problems than I was getting with base plan mode. I've been attributing this to its multiple layers of cross checks and self-reviews. Yes, I could do that by hand of course, but I find superpowers is automating what I was already trying to accomplish in this regard.

  • Yes, it does help in that way. Maybe I'm still struggling to let go and let AI take the wheel from beginning to end but I enjoy the exploratory part of the whole process (investigating possible solutions, trying theories, doing little spikes, etc, all with CC's assistance). When it's time to actually code, I just let it do its own thing mostly unsupervised. I do spend quite a lot of time on spec writing.

    • That’s part of what I’ve liked about it over plan mode. Again not a scientific measurement but I feel it’s better at interactive brainstorming and researching the big picture with me. And it’s built in multiple checkpoints also give me more space to pivot or course correct.

Just tried GSD and Plan Mode on the same exact task (prompt in an MD file). Plan Mode had a plan and then base implementation in twenty minutes. GSD ran for hours to achieve the same thing.

I reviewed the code from both and the GSD code was definitely written with the rest of the project and possibilities in mind, while the Claude Plan was just enough for the MVP.

I can see both having their pros and cons depending on your workflow and size of the task.

I use GitHub Copilot and unfortunately there has been a weird regression in the bundled Plan mode. It suddenly, when they added the new plan memory, started getting both VERY verbose in the plan output and also vague in the details. It's adding a lot of step that are like "design" and "figure out" and railroads you into implementation without asking follow-up questions.

  • I find that even with opus 4.6, copilot feels like it’s handicapped. I’m not sure if it’s related to memory or what but if I give two tasks to opus4.6 one in CC and one in Copilot, CC is substantially better.

    I’ve been really enjoying Codex CLI recently though. It seems to do just as well as Opus 4.6, but using the standard GPT 5.4

    • I have the same experience with Antigravity and Gemini CLI, both using Gemini 3 Pro. CLI works on the problem with more effort and time. Meanwhile, antigravity writes shitty python scripts for a few seconds and calls it a day. The agent harness matters a lot

    • I think this shows that the model alone isn't the complete story and that these "harnesses" (as people seem to be calling them) shape a lot of the experienced behavior of these tools.

      1 reply →

  • > VERY verbose in the plan output

    Is that an issue? GitHub charges per-request, not per-token, so a verbose output and short output will be the same cost

    What model are you using?

    • The problem might be that our brains charge per token, which makes reviewing hard. :)

Same experience. Superpowers are a little too overzealous at times. For coding especially I don’t like seeing a comprehensive design spec written (good) and then turning that into effectively the same doc but macro expanded to become a complete implementation with the literal code for the entire thing in a second doc (bad). Even for trivial changes I’d end up with a good and succinct -design.md, then an -implementation.md, then end with a swarm of sub agents getting into races while more or less just grabbing a block from the implementation file and writing it.

A mess. I still enjoy superpowers brainstorming but will pull the chute towards the end and then deliver myself.

  • Yes. I sometimes had to specifically ask it to NOT add any code to the specs because that would be done at a later stage.

Yup yup yup. I burned literally a weeks worth of the 20$ claude subscription and then 20$ worth of API credits on gsdv2. To get like 500 LOC.

And that was AFTER literally burning a weeks worth of codex and Claude 20$ plans and 50$ API credits and getting completely bumfucked - AI was faking out tests etc.

I had better experiences just guiding the thing myself. It definitely was not a set and forget experience (6 hours of constant monitoring) but I was able to get a full research MVP that informed the next iteration with only 75% of a codex weekly plan.

I've played around a bit with the plugins and as you've said, plan mode really handles things fine for the most part. I've got various workflows I run through in Claude and I've found having CC create custom skills/agents created for them gets me 80% of the way there. It's also nice that letting the Claude file refer to them rather than trying to define entire workflows within it goes a long way. It'll still forget things here and there, leading to wasted tokens as it realizes it's being dumb and corrects itself, but nothing too crazy. At least, it's more than enough to let me continue using it naturally rather than memorizing a million slash commands to manually evoke.

I have been using superpowers for Gryph development for a while. Love the brainstorming and exploration that it brings in. Haven’t really compared token usage but something in my bucket.

> I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself.

Plan mode is great, but to me that's just prompting your LLM agent of choice to generate an ad-hoc, imprecise, and incomplete spec.

The downside of specs is that they can consume a lot of context window with things that are not needed for the task. When that is a concern, passing the spec to plan mode tends to mitigate the issue.

Why are we using cli wrappers if you're using Claude Code? I get if you need something like Codex but they released sub agents today so maybe not even that, but it's an unnecessary wrapper for Claude Code.

  • Wrappers are useful for some tasks. I use ralph loops for things that are extremely complicated and take days of work. Like reverse engineering projects or large scale migration efforts.

    • Even with the 1 mil context windows? Can't you just keep the orchestrator going and run sub agents? Maybe the added space is too new? I also haven't tested out the context rot from 300K and up. Would love some color on it from first hand exp.

      3 replies →

  • So that you can have a fresh context for every little thing. These harnesses basically marry LLMs with deterministic software logic. The harness programmatically generates the prompts and stores the output, step by step.

    You never want the LLM to do anything that deterministic software does better, because it inflates the context and is not guaranteed to be done accurately. This includes things like tracking progress, figuring out dependency ordering, etc.