← Back to context

Comment by Tainnor

2 days ago

The sample size isn't 16 developers, it's 246 issues.

So agree with that - but on the other hand surely the number of developers matters here? For example, if instead of 16 developers the study consisted of a single developer completing all 246 tasks with or without AI, and comparing the observed times to complete, I think most people would question the reproducibility and relevancy of the study?

  • It matters in the sense that it is unclear whether the findings generalise to other people. Which is a problem that a lot of studies, even with more participants, have because they may not have a diverse enough set of participants.

    But in terms of pure statistical validity, I don't think it matters.

Whilst my recent experience possibly agrees with the findings, I came here to moan about the methods. Whether it's 16 or 246, that's still a miserably small sample size.

Okay, so why not 246,000 issues?

  • If you read through the methodology, including how they paid the participants $150 / hr, for 20-40 hours work per participant, you can probably hazard a guess why they didn't scale up the size of the study by 1000x.