Comment by fieldcny

4 years ago

Dumb question, What is the significance of this?

14 comments

fieldcny

From a signal processing perspective: being able to recognise signals in the presence of interference, noise and distortion.

For example, you might have a radio signal (such as WiFi) that you want to receive. First step is that you have to pick that signal out of whatever signal comes out of your radio receiver: which will be the WiFi signal along with all sorts of noise and interference from other users. Typically the search will be done with the mentioned "Pearson's Correlation", using it to compare the received signal with an expected template: a value of 1.0 meaning the received signal is a perfect match with the template, a value of 0.0 meaning no match at all. If the wanted signal is present, interference, noise and distortion will reduce the result of the correlation to less than 1.0, meaning you might miss the WiFi signal, even though it is present.

This article is about coming up with a measure that gives a more robust result in the face of noise, interference and distortion. It's fundamental stuff, in that it has quite general application.

loxias 4 years ago
(Yay signal processing!)
Skimming it now, this looks wild. Using the variance of the rank of the dataset (for a given point, how many are less than that point) seems... weird, and throwing out some information. The author seems legit tho, so I can't wait to try drop-in implementing this in a few things.
- mattkrause 4 years ago
  
  Rank-transforms are pretty common: they show up in a lot of non-parametric hypothesis tests, for example.
  The neat thing about ranks is that, in aggregate, they're very robust. You can make an estimate of the mean arbitrarily bad by tweaking a single data point: just send it towards +/- infinity and the mean will follow. The median, on the other hand, is barely affected by that sort of shenanigans.

rwilson4 4 years ago

Correlation typically means y is a linear function of x, but people usually interpet it (incorrectly) as: knowing x tells you something about y. If y = x^2, then y is determined completely by x, but since it's nonlinear the correlation may actually be zero depending on the distribution of x. This paper proposes a statistic that will indicate if y is related to any function of x, linear or nonlinear.

KarlKemp 4 years ago
This is... quite wrong? The dictionary says:
1. a mutual relationship or connection between two or more things 2. [Statistics] interdependence of variable quantities. 3. [Statistics] a quantity measuring the extent of the interdependence of variable quantities.
The most sympathetic to your definition is Wikipedia:
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense correlation is any statistical association, though it actually refers to the degree to which a pair of variables are linearly related.
And that's the mathematical formulation. Correlation also has a meaning in everyday speech, and mathematics doesn't have the authority to just adopt terms and then claim people are wrong after they've changed the meaning.
Also correlation very definitely means that knowing <x> tells you something about <y>. And vice versa. Like, for example: its value. Or at least a better idea of it than pure guessing without correlation.
- doovd 4 years ago
  
  That's s bit unnecessarily pedantic, I think we all understand in which context we are talking about correlation.
- UncombedCoconut 4 years ago
  
  Hello, see here for an explanation: https://en.wikipedia.org/wiki/Pearson_correlation_coefficien... It's widely understood that the words "correlation" and "uncorrelated", when used in the context of statistics and not otherwise qualified, are shorthand for this definition in particular. By "otherwise qualified" I mean, for example, saying "Spearman's correlation" (in in the OP's abstract) to specify this one: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_...
  
  2 replies →
- popcube 4 years ago
  
  sorry, scientists always use nomal English words in their region and then it will get meaning in this specific region. It maybe is confused, but the advantage that most people understand English is enough.
- PopeUrbanX 4 years ago
  
  I think it's safe to say you're not the intended audience for anything math-related, given that you're going to a dictionary to try to figure things out...
ouid 4 years ago

I don't think that there's a standard enough mathematical definition of correlation to say that. Perhaps the word has been coopted but the paper linked suggests that the cooprion isn't accepted.

singhrac 4 years ago

Well, in the abstract it says: “[a coefficient] which is 0 if and only if the variables are independent and 1 if and only if one is a measurable function of the other”, the former property which is not true of general random variables (but is true of Gaussians, which is one part of the reason they are used everywhere). I’m not sure about the latter property, actually, but I also doubt it’s true.

Worth noting the author is a highly regarded professor at Stanford.

csee 4 years ago

It's fast to calculate, simple to understand, and doesn't make assumptions about the underlying distributions. This makes it a more effective generic tool for practitioners. Perhaps useful in the way the Pearson correlation is useful.

I'd like to learn more about the small sample properties. Proofs of asymptotics are necessary but less interesting. But the author's examples on example data sets look like it makes sense.