Comment by solid_fuel

7 months ago

I would love to see a comparison of the pull requests generated by each workflow, if possible. My experience with Copilot has generally been that it suggests far more code than I would actually write to solve a specific problem - sometimes adding extra checks where they aren't needed, sometimes just being more verbose than I would be, and oftentimes repeating itself where it would be better to use an abstraction.

My personal hypothesis is that seeing the LLM write _so much_ code may create the feeling that the problems it is solving would take longer to solve by yourself.

1 comment

solid_fuel

narush 7 months ago

Check out section AI increasing issue scope (C.2.3) in the paper -- https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

We speak (the best we can) to changes in amount of code -- I'll note that this metric is quite messy and hard to reason about!