Comment by senko
7 days ago
> They're also not going to be able to direct three different agents at once in different areas of a large project that they've designed the architecture for.
I wonder what the practical limits are.
As a senior dev on a greenfield solo project it's too exhausting for me to have two parallel agents (front/back), most of the time they're waiting for me to spec, review or do acceptance test. Feels like sprinting, not something I could do day in and day out.
Might be due to tasks being too fine grained, but assuming larger ones are proportionally longer to spec and review, I don't see more than two (or, okay, three, maybe I'm just slow) being a realistic scenario.
More than that, I think we're firmly in the vibe coding (or maybe spec-driven vibe coding) territory.
At least on a team, the limit is the team's time to review all the code. We've also found that vibe engineered (or "supervised vibing" as I call it) code tends to have more issues in code review because of a false sense of security creating blind spots when self reviewing. Even more burden on the team.
We're experimenting with code review prompts and sub agents. Seems local reviews are best, so the bulk of the burden is on the vibing engineer, rather than the team.
Do you have a sense for how much overhead this is all adding? Or, to put it another way, what I’m really asking is what productivity gain (or loss) are you seeing versus traditional engineering?
In our experience, it depends on the task and the language. In the case of trivial or boilerplate code, even if someone pushes 3k-4k lines of code in one day, it's manageable because you can just go through it. However, 3k lines of interconnected modules, complex interactions, and intricate logic require a lot of brainpower and time to review properly and in most cases, there are multiple bugs, edge cases that haven't been considered, and other issues scattered throughout the code.
3 replies →
Isn't the current state of thing such that it's really hard to tell? I think the METR study showed that self-reported productivity boosts aren't necessarily reliable.
I have been messing with vibe engineering on a solo project and I have such a hard time telling if there's an improvement. It's this feeling of "what's faster, one lead engineer coding or one lead engineer guiding 3 energetic but naive interns"?
Very curious to hear responses about this too
7 replies →
I resonate on the exhaustion — actually, the context switching fatigue is why we built Sculptor for ourselves (https://imbue.com/sculptor). We usually see devs running 4-6 agents in parallel today using Sculptor today. Personally I think much of the fatigue comes from: 1) friction in spawning agents 2) friction in reviewing agent changes 3) context management annoyance when e.g. you start debugging part of the agent's work but then have to reload context to continue the original task
It's still super early, but we've felt a lot less fatigued using Sculptor so far. To make it easier to spawn agents without worrying, we run agents in containers so they can run in YOLO mode and don't interfere with each other. To make it easy to review changes, we made "Pairing Mode", lets you instantly sync any agent's work from the container into your local IDE to test it, then switch to another.
For context management, we just shipped the ability to fork agents form any point in the convo history, so you can reuse an agent that you loaded with high-quality context and fork off to debug an agent's changes or try all options it presented. It also lets you keep a few explorations going and check in when you have time.
Anyway, sorry, shilling the product a bit much but I just wanted to say that we've seen people successfully use more than 2 agents without feeling exhausted!
What gives you the fatigue?
Switching between the two parallel agents (frontend & backend, same project), requiring context switches.
I'm speccing out the task in detail for one agent, then reviewing code for the previous task on the other agent and testing the implementation, then speccing the next part for that one (or asking for fixes/tweaks), then back to the first agent.
They're way faster in producing code than I am in reviewing and spelling out in details what I want, meaning I always have the other one ready.
When doing everyting myself, there are periods where I need to think hard and periods where it's pretty straightforward and easy (typing out the stuff I envisioned, boilerplate, etc).
With two agents, I constantly need to be on full alert and totally focused (but switching contexts every few minutes), which is way more tiring for me.
With just one agent, the pauses in the workflow (while I'm waiting for it to finish) are long enough to get distracted but short enough to not being able to do anything else (mostly).
Still figuring out the sweet spot for me personally.
I've been meaning to try out some text-to-speech to see if that makes it a bit easier. Part of the difficulty of "spelling out in detail what I want" is the need for precise written language, which is high cognitive load, which makes the context switching difficult.
Been wondering if just natural speaking could both speed up typing. Maybe have an embedded transform/compaction that strips out all the ummms and gets to the point of what you were trying to say. Might have lower cognitive load, which could make it easier.
1 reply →