Comment by robwwilliams

3 days ago

Or a sampling artifact. 4 vs 12 does seem significant within a study, but consider a set of N such studies.

I assume that many large companies have tested efficiency gains and losses of there programmers much more extensively than the authors of this tiny study.

A survey of companies and their evaluation and conclusions would carry more weight—-excluding companies selling AI products, of course.

If you use binomial test, P(X<=4) is about 0.105 which means p = 0.21.