Comment by thesz

4 years ago

I should point that Pearson's correlation coefficient can be updated on-line, the storage complexity is O(1). The storage complexity for sorting is O(N).

Thus, this particular algorithm will not fit into, say, traffic analysis framework.

3 comments

thesz

omnicognate 4 years ago

Yeah, this has worse complexity than Pearson. However, it's not competing on complexity.

The point of it is to detect general dependencies, where Pearson only detects linear ones (cases where Y is completely determined by X but their Pearson correlation is 0 are trivial to construct). The "linear" in linear dependency isn't the same as the "linear" in linear complexity here, but it doesn't seem surprising for an algorithm that can detect more general dependencies to have worse than linear performance (and nlogn is pretty much the next best).

In the absence of a known algorithm with the same properties and better complexity, it doesn't seem unreasonable to call it "computationally efficient" when it comes with a complexity sufficient for most uses, rather than say quadratic or exponential.

thesz 4 years ago

Rank based correlation coefficients: https://news.ycombinator.com/item?id=29690208
It looks like this "new coefficient of correlation" is not without flaws compared to others.

eru 4 years ago

The runtime and storage requirements you cite only apply when you need exact answers.

If you want to compute a correlation coefficient, you are probably happy to get eg 3 significant digits of precision. Thus you can use approximation algorithms and sampling.