← Back to context

Comment by hansvm

4 years ago

Pearson still has a lot of advantages on modern hardware and in practice should be a ton faster, but:

(1) Rank coefficients can absolutely be updated online.

(2) There aren't many applications where an extra polylogarithmic factor is the difference between a good-enough algorithm and something too slow.

(3) The RNG is an implementation detail I'd probably omit. It's asymptotically equivalent to the deterministic operation of rescaling the sum of all pairwise absolute differences of Y values for each run of identical X values. If in your initial sort of the X values you instead lexicographically sort by X then Y then you can get away with a linear algorithm for computing those pairwise absolute differences.