← Back to context

Comment by minimaxir

2 years ago

Explaining and utilizing bootstrapping would make this post even longer and much more difficult to understand for non-statisticians.

Bootstrapping is best used for compensating for low amounts of data, which is why I suggested a change going to forward is to generate much more synthetic data.

Would it? You didnt need to explain the theory behind the KS test. The result is easier to interpret - it could be something like “the $500 tip results in answers that are 0.95 characters closer to the target, on average”. That seems a lot better than the unitless, weirdly scaled KS values.

Bootstrapping works great for any volume of data. Its also nice that mean-difference bootstraps have extremely few distributional assumptions, which is really handy with these unmodelable source data distributions.