Comment by sheepfacts
2 days ago
Perhaps is difficult to measure personal productivity in programming, but we can measure that we will run more slowly with 10 kg. in our backpack. I propose this procedure: The SWE selects 10 tasks and guesses some measure of their complexity (time to finish them) and then he randomly select 5 to be done with AI and the rest without. He performs them and finally calculates a deviation D. The deviation D = D_0 - D_1 where D_i = sum (real_time/guessed_time - 1), where D_0 is using AI and D_1 is without AI, the sign and magnitude of D measure respectively if the use of AI is beneficial or detrimental and the impact of using AI. Also, clipping individuals addends to be in the interval [-0.5,0.5] should avoid one bad guess dominating the estimation. Sorry if this is a trivial ideal but it is feasible and intuitively should provide useful information if the tasks are taken among the ones in which each initial guessing has small deviation. A filter should be applied to tasks in which scaffolding by AI surpass a certain relative threshold in case we are interested in generalizing our results to tasks in which scaffolding is not dominating time.
It could happen that the impact of using AI depends of the task at hand, the capability of the SWE to pair programming with it, and of the LLM used, to such an extend that those factors were bigger that the average effect on a bag of tasks, in this case the large deviation from the mean makes any one parameter estimation void of useful information.
That's pretty much what the study the article refers too did, and it found the use of AI was 19% slower.