Comment by solid_fuel
3 days ago
I would love to see a comparison of the pull requests generated by each workflow, if possible. My experience with Copilot has generally been that it suggests far more code than I would actually write to solve a specific problem - sometimes adding extra checks where they aren't needed, sometimes just being more verbose than I would be, and oftentimes repeating itself where it would be better to use an abstraction.
My personal hypothesis is that seeing the LLM write _so much_ code may create the feeling that the problems it is solving would take longer to solve by yourself.
Check out section AI increasing issue scope (C.2.3) in the paper -- https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
We speak (the best we can) to changes in amount of code -- I'll note that this metric is quite messy and hard to reason about!