Comment by robwwilliams
3 days ago
Or a sampling artifact. 4 vs 12 does seem significant within a study, but consider a set of N such studies.
I assume that many large companies have tested efficiency gains and losses of there programmers much more extensively than the authors of this tiny study.
A survey of companies and their evaluation and conclusions would carry more weight—-excluding companies selling AI products, of course.
If you use binomial test, P(X<=4) is about 0.105 which means p = 0.21.