Comment by gtirloni

5 days ago

I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself. These frameworks are great for fire-and-forget tasks, especially when there is some research involved but they burn 10x more tokens, in my experience. I was always hitting the Max plan limits for no discernable benefit in the outcomes I was getting. But this will vary a lot depending on how people prefer to work.

51 comments

gtirloni

marcus_holmes 5 days ago

I ended up grafting the brainstorm, design, and implementation planning skills from Superpowers onto a Ralph-based implementation layer that doesn't ask for my input once the implementation plan is complete. I have to run it in a Docker sandbox because of the dangerously set permissions but that is probably a good idea anyway.

It's working, and I'm enjoying how productive it is, but it feels like a step on a journey rather than the actual destination. I'm looking forward to seeing where this journey ends up.

LogicFailsMe 5 days ago
I find simple Ralph loops with an implementer and a reviewer that repeat until everything passes review and unit tests is 90% of the job.
I would love to do something more sophisticated but it's ironic that when I played both agents in this loop over the past few decades, the loop got faster and faster as computers got faster and faster. Now I'm back to waiting on agentic loops just like I used to wait for compilations on large code bases.
- hatmanstack 4 days ago
  
  Curious what you mean by "played both agents" and "faster and faster"? API calls are API Calls or are you running an open-source model locally?
  
  2 replies →
auggierose 5 days ago
If it is working, why is it just a step on a journey? What is missing?
- marcus_holmes 4 days ago
  
  It's a kludged-together dev process made up of two different systems in a docker container so potential damage is contained. It's not ideal ;)
  Neither of those two systems feel evolved either. Superpowers is very cool, but there are holes still. And Ralph feels like an experiment that worked so they published it.
  This is all going somewhere, evolving and moving towards some beautiful system. Or maybe the usual dev ecosystem shit - it'll be a great prototype and then it'll get overthought, overcomplicated and overengineered and end up less usable than what we had before *glares at React*
jghn 5 days ago
did you hand modify the superpowers skills or are you managing this some other way?
- hatmanstack 4 days ago
  
  For me, I just created my own prompt pipeline, with a nod towards GANs all of the necessary permissions get surfaced so I don't need to babysit it, and all are relatively simple. No need for Yolo or Dangerously setting Permissions.
- marcus_holmes 4 days ago
  
  yeah, I coped the skills I wanted into a directory, hacked away at them until they did what I wanted, and then added them to the dockerfile for the sandbox

jghn 5 days ago

I've gone the other way recently, shifting from pure plan mode to superpowers. I was reminded of it due to the announcement of the latest version.

It is perhaps confirmation bias on my part but I've been finding it's doing a better job with similar problems than I was getting with base plan mode. I've been attributing this to its multiple layers of cross checks and self-reviews. Yes, I could do that by hand of course, but I find superpowers is automating what I was already trying to accomplish in this regard.

gtirloni 5 days ago
Yes, it does help in that way. Maybe I'm still struggling to let go and let AI take the wheel from beginning to end but I enjoy the exploratory part of the whole process (investigating possible solutions, trying theories, doing little spikes, etc, all with CC's assistance). When it's time to actually code, I just let it do its own thing mostly unsupervised. I do spend quite a lot of time on spec writing.
- jghn 5 days ago
  
  That’s part of what I’ve liked about it over plan mode. Again not a scientific measurement but I feel it’s better at interactive brainstorming and researching the big picture with me. And it’s built in multiple checkpoints also give me more space to pivot or course correct.

healsdata 5 days ago

Just tried GSD and Plan Mode on the same exact task (prompt in an MD file). Plan Mode had a plan and then base implementation in twenty minutes. GSD ran for hours to achieve the same thing.

I reviewed the code from both and the GSD code was definitely written with the rest of the project and possibilities in mind, while the Claude Plan was just enough for the MVP.

I can see both having their pros and cons depending on your workflow and size of the task.

Rapzid 5 days ago

I use GitHub Copilot and unfortunately there has been a weird regression in the bundled Plan mode. It suddenly, when they added the new plan memory, started getting both VERY verbose in the plan output and also vague in the details. It's adding a lot of step that are like "design" and "figure out" and railroads you into implementation without asking follow-up questions.

whalesalad 5 days ago
I find that even with opus 4.6, copilot feels like it’s handicapped. I’m not sure if it’s related to memory or what but if I give two tasks to opus4.6 one in CC and one in Copilot, CC is substantially better.
I’ve been really enjoying Codex CLI recently though. It seems to do just as well as Opus 4.6, but using the standard GPT 5.4
- chaostheory 5 days ago
  
  I have the same experience with Antigravity and Gemini CLI, both using Gemini 3 Pro. CLI works on the problem with more effort and time. Meanwhile, antigravity writes shitty python scripts for a few seconds and calls it a day. The agent harness matters a lot
- Atotalnoob 5 days ago
  
  Copilot feels like being a caveman, Claude code feels like modern times comparatively.
- gtirloni 4 days ago
  
  I think this shows that the model alone isn't the complete story and that these "harnesses" (as people seem to be calling them) shape a lot of the experienced behavior of these tools.
  
  1 reply →
- codebolt 4 days ago
  
  Opus 4.6 has a 200k context limit in Copilot. Could be the issue.
- nfg 5 days ago
  
  As a matter of interest are you using the copilot cli?
  
  2 replies →
NSPG911 5 days ago
> VERY verbose in the plan output
Is that an issue? GitHub charges per-request, not per-token, so a verbose output and short output will be the same cost
What model are you using?
- jounker 4 days ago
  
  The problem might be that our brains charge per token, which makes reviewing hard. :)

whalesalad 5 days ago

Same experience. Superpowers are a little too overzealous at times. For coding especially I don’t like seeing a comprehensive design spec written (good) and then turning that into effectively the same doc but macro expanded to become a complete implementation with the literal code for the entire thing in a second doc (bad). Even for trivial changes I’d end up with a good and succinct -design.md, then an -implementation.md, then end with a swarm of sub agents getting into races while more or less just grabbing a block from the implementation file and writing it.

A mess. I still enjoy superpowers brainstorming but will pull the chute towards the end and then deliver myself.

gtirloni 4 days ago

Yes. I sometimes had to specifically ask it to NOT add any code to the specs because that would be done at a later stage.

sigbottle 5 days ago

Yup yup yup. I burned literally a weeks worth of the 20$ claude subscription and then 20$ worth of API credits on gsdv2. To get like 500 LOC.

And that was AFTER literally burning a weeks worth of codex and Claude 20$ plans and 50$ API credits and getting completely bumfucked - AI was faking out tests etc.

I had better experiences just guiding the thing myself. It definitely was not a set and forget experience (6 hours of constant monitoring) but I was able to get a full research MVP that informed the next iteration with only 75% of a codex weekly plan.

FromTheFirstIn 4 days ago
You spent $25 on 500 LOC?
- sigbottle 4 days ago
  
  Well, there were milestones and docs and extra scaffolding that the gsd system produces, but yes. and it didn't seem like progress was going to go any faster.
  
  1 reply →

SayThatSh 5 days ago

I've played around a bit with the plugins and as you've said, plan mode really handles things fine for the most part. I've got various workflows I run through in Claude and I've found having CC create custom skills/agents created for them gets me 80% of the way there. It's also nice that letting the Claude file refer to them rather than trying to define entire workflows within it goes a long way. It'll still forget things here and there, leading to wasted tokens as it realizes it's being dumb and corrects itself, but nothing too crazy. At least, it's more than enough to let me continue using it naturally rather than memorizing a million slash commands to manually evoke.

abhisek 5 days ago

I have been using superpowers for Gryph development for a while. Love the brainstorming and exploration that it brings in. Haven’t really compared token usage but something in my bucket.

locknitpicker 5 days ago

> I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself.

Plan mode is great, but to me that's just prompting your LLM agent of choice to generate an ad-hoc, imprecise, and incomplete spec.

The downside of specs is that they can consume a lot of context window with things that are not needed for the task. When that is a concern, passing the spec to plan mode tends to mitigate the issue.

hatmanstack 5 days ago

Why are we using cli wrappers if you're using Claude Code? I get if you need something like Codex but they released sub agents today so maybe not even that, but it's an unnecessary wrapper for Claude Code.

odie5533 5 days ago
Wrappers are useful for some tasks. I use ralph loops for things that are extremely complicated and take days of work. Like reverse engineering projects or large scale migration efforts.
- hatmanstack 5 days ago
  
  Even with the 1 mil context windows? Can't you just keep the orchestrator going and run sub agents? Maybe the added space is too new? I also haven't tested out the context rot from 300K and up. Would love some color on it from first hand exp.
  
  3 replies →
roncesvalles 5 days ago

So that you can have a fresh context for every little thing. These harnesses basically marry LLMs with deterministic software logic. The harness programmatically generates the prompts and stores the output, step by step.
You never want the LLM to do anything that deterministic software does better, because it inflates the context and is not guaranteed to be done accurately. This includes things like tracking progress, figuring out dependency ordering, etc.
gtirloni 5 days ago
GSD and superpowers aren't CLI wrappers?
- hatmanstack 5 days ago
  
  It's a cli wrapper. Don't know how you could say it wasn't.
  edit: GSD is a cli wrapper, Superpowers not so much. Both are over-engineered for an easy problem IMHO.
  
  6 replies →

andai 5 days ago

What's happening with the other 90%?