Comment by jebarker

6 months ago

> code still needs to be reviewed and tested, at least as much as you'd scrutinize the code of a brand new engineer just out of boot camp

> ..._massive_ boost to productivity. ~20% of the commits to the OpenHands codebase are now authored or co-authored by OpenHands itself.

I'm having trouble reconciling these statements. Where does the productivity boost come from since that reviewing burden seems much greater than you'd have if you knew commits were coming from a competent human?

11 comments

jebarker

lars512 6 months ago

There's often a lot of small fixes that not time efficient to do, but a solution is not much code and is quick to verify.

If the cost is small to setting a coding agent (e.g. aider) on a task, seeing if it reaches a quick solution, and just aborting if it spins out, you can solve a subset of these types of issues very quickly, instead of leaving them in issue tracking to grow stale. That lets you up the polish on your work.

That's still quite a different story to having it do the core, most important part of your work. That feels a little further away. One of the challenges is the scout rule, the refactoring alongside change that makes the codebase nicer. I feel like today it's easier to get a correct change that slightly degrades codebase quality, than one that maintains it.

jebarker 6 months ago
Thanks - this all makes sense - I still don't feel like this would constitute a massive productivity boost in most cases, since it's not fixing time consuming major issues. But I can see how it's nice to have.
- rbren 6 months ago
  
  The bigger win comes not from saving keystrokes, but from saving you from a context switch.
  Merge conflicts are probably the biggest one for me. I put up a PR and move onto a new task. Someone approves, but now there are conflicts. I could switch off my task, spend 5-10 min remembering the intent of this PR and fixing the issues. Or I could just say "@openhands fix the merge conflicts" and move back to my new task.
  
  1 reply →

lolinder 6 months ago

I haven't started doing this with agents, but with autocomplete models I know exactly what OP is talking about: you stop trying to use models for things that models are bad at. A lot of people complain that Copilot is more harm than good, but after a couple of months of using it I figured out when to bother and when not to bother and it's been a huge help since then.

I imagine the same thing applies to agents. You can waste a lot of time by giving them tasks that are beyond them and then having to review complicated work that is more likely to be wrong than right. But once you develop an intuition for what they can and cannot do you can act appropriately.

drewbug01 6 months ago

I suspect that many engineers do not expend significant energy on reviewing code; especially if the change is lengthy.

linsomniac 6 months ago

>burden seems much greater than...

Because the burden is much lower than if you were authoring the same commit yourself without any automation?

jebarker 6 months ago
Is that true? I'd like to think my commits are less burdensome to review than a fresh out of boot camp junior dev especially if all that's being done is fixing linter issues. Perhaps there's a small benefit, but doesn't seem like a major productivity boost.
- ErikBjare 6 months ago
  
  A junior dev is not a good approximation of the strengths and weaknesses of these models.
  
  2 replies →