Comment by afro88

7 days ago

At least on a team, the limit is the team's time to review all the code. We've also found that vibe engineered (or "supervised vibing" as I call it) code tends to have more issues in code review because of a false sense of security creating blind spots when self reviewing. Even more burden on the team.

We're experimenting with code review prompts and sub agents. Seems local reviews are best, so the bulk of the burden is on the vibing engineer, rather than the team.

14 comments

afro88

bartread 7 days ago

Do you have a sense for how much overhead this is all adding? Or, to put it another way, what I’m really asking is what productivity gain (or loss) are you seeing versus traditional engineering?

lossolo 6 days ago
In our experience, it depends on the task and the language. In the case of trivial or boilerplate code, even if someone pushes 3k-4k lines of code in one day, it's manageable because you can just go through it. However, 3k lines of interconnected modules, complex interactions, and intricate logic require a lot of brainpower and time to review properly and in most cases, there are multiple bugs, edge cases that haven't been considered, and other issues scattered throughout the code.
- agentultra 6 days ago
  
  And empirical studies on informal code review show that humans have a very small impact on error rates. It disappears when they read more than roughly 200 SLOC in an hour.
  
  2 replies →
eugmill 6 days ago

Isn't the current state of thing such that it's really hard to tell? I think the METR study showed that self-reported productivity boosts aren't necessarily reliable.
I have been messing with vibe engineering on a solo project and I have such a hard time telling if there's an improvement. It's this feeling of "what's faster, one lead engineer coding or one lead engineer guiding 3 energetic but naive interns"?
rhetocj23 7 days ago
Very curious to hear responses about this too
- oblio 6 days ago
  
  The problem with this is that software engineering is a very unorganized and fashion/emotion driven domain.
  We don't have reliable productivity numbers for basically... anything.
  I <feel> that I'm more productive with statically typed languages but I haven't seen large scale, reliable studies. Same with unit tests, integration tests, etc.
  And then there are all the types of software engineering: web frontend, web API, mobile frontend, command line frontend, Windows GUI, MacOS GUI, Linux backend (10 million different stacks), Windows backend (1 million different stacks), throwaway projects, WordPress webpages, etc, etc.
  
  6 replies →