← Back to context

Comment by _pdp_

12 hours ago

One of our projects has 1.2K open pull requests.

https://i.postimg.cc/Jnfk9b8g/Xnapper-2026-02-28-22-25-42.pn...

We probably accept 1-2 per day.

I personally discard code for the tiniest of reasons. If something feels off moments after I open the PR, it gets deleted. The reason we still have 1.2K open PRs is because we can't review all of them in time.

The most likely solution is to delete all of them after a month or two. By that time the open PRs on this project alone will be at least 10-20 more.

Doesn't seem like too efficient process, no? Seems to me like investment in better quality of the output is exactly what is needed here, wouldn't you agree?

  • I feel they sit of on the opposite end of the OP here. One wants to write out specs to control the agent implementation to achieve a one shot execution. Other side says: let’s won’t waste time of humans writing anything.

    I’m personally torn. A lot of the spec talk and now here in combination with TDD etc feels like the pipe dreams of the mid 2000. There was this idea of the Architect role who writes UML and specs. And a normal engineer just fills in the gaps. Then there was TDD. Nothing against it personally. But trying to write code in test first approach when you don’t really have a clue how a specific platform/system/library works had tons of overhead. Also the side effect of code written in the most convenient way to be tested and not to be executed. All in all to throw this ideas together for AI now… But throwing tokens out of the window and hoping for the token lottery to generate the best PR is also not the right direction in my book. But somebody needs to investigate in both extremes I say.

    • Actually, nobody said the spec needs to be written by humans.

      My personal opinion: with today's LLMs, the spec should be steered by a human because its quality is proportional to result quality. Human interaction is much cheaper at that stage — it's all natural language that makes sense. Later, reasoning about the code itself will be harder.

      In general, any non-trivial, valuable output must be based on some verification loop. A spec is just one way to express verification (natural language — a bit fuzzy, but still counts). Others are typecheckers, tests, and linters (especially when linter rules relate to correctness, not just cosmetics).

      Personally, on non-trivial tasks, I see very good results with iterative, interactive, verifiable loops:

      - Start with a task

      - Write spec in e.g. SPEC.md → "ask question" until answer is "ok"/proceed

      - Write implementation PLAN.md — topologically sorted list of steps, possibly with substeps → ask question

      - For each step: implement, write tests, verify (step isn't done until tests pass, typecheck passes, etc.); update SPEC/PLAN as needed → ask question

      - When done, convert SPEC.md and PLAN.md into PR description (summary) and discard

      ("Ask question" means an interactive prompt that appears for the user. Each step is gated by this prompt — it holds off further progress, giving you a chance to review and modify the result in small bits you can actually reason about.) The workflow: you accept all changes before confirming the next step. This way you get code deltas that make sense. You can review and understand them, and if something's wrong you can modify by hand (especially renames, which editors like VS Code handle nicely) or prompt for a change. The LLM is instructed to proceed only when the re-asked answer is "ok".

      This works with systems like VSCode Copilot, not so much with CC cli.

      I'm looking forward to an automated setup where the "human" is replaced by an "LLM judge" — I think you could already design a fairly efficient system like this, but for my work LLMs aren't quite there yet.

      That said, there's an aspect that shouldn't be forgotten: this interactive approach keeps you in the driving seat and you know what's happening with the codebase, especially if you're running many of these loops per day. Fully automated solutions leave you outside the picture. You'll quickly get disconnected from what's going on — it'll feel more like a project run by another team where you kind of know what it does on the surface but have no idea how. IMO this is dangerous for long-term, sustainable development.