Comment by not2b
11 hours ago
If the result is statistically significant, it just barely makes it. 84.8% isn't that much higher than 80.8% and they had only 250 prompts, if I'm reading this right.
11 hours ago
If the result is statistically significant, it just barely makes it. 84.8% isn't that much higher than 80.8% and they had only 250 prompts, if I'm reading this right.
In a field where progress is measured in tenths of percent points, that's not true. Think of it this way: the error rate drops from 19% to 15%, or from 1 in 5 to 1 in 6.
Statistical significance is about whether an effect can reliably be said to have been measured at all; it's not about whether or not the effect itself would be significant in the sense of moving some other needle.
The ~5% improvement reported here might just be an artefact of the data collection or random variation, rather than a consistent repeatable change.
[dead]