Comment by roenxi

4 years ago

Either Chatterjee has made a mistake, that is wrong or that definition has some extremely precise meanings because:

    > counterexample = tibble(x = 1:6, y = c(1.1,-1.2,1.3,-1.4,1.5,-1.6))
    > counterexample
    # A tibble: 6 × 2
          x     y
      <int> <dbl>
    1     1   1.1
    2     2  -1.2
    3     3   1.3
    4     4  -1.4
    5     5   1.5
    6     6  -1.6
    > XICOR::calculateXI(xvec = counterexample$x, yvec=counterexample$y)
    [1] -0.2857143
    > -0.2857143 == 1
    [1] FALSE

& I assume you could get data like that in the wild sampling a sin wave as a timeseries.

I think the abstract is a bit strong, it probably means "converges to" for a large repeating sample.

2 comments

roenxi

cirpis 4 years ago

There is no mistake in the definition and this is all elaborated upon in page 4 of the article.

Quote: " On the other hand, it is not very hard to prove that the minimum possible value of ξn(X, Y ) is −1/2 + O(1/n), and the minimum is attained when the top n/2 values of Yi are placed alternately with the bottom n/2 values. This seems to be paradoxical, since Theorem 1.1 says that the limiting value is in [0, 1]. The resolution is that Theorem 1.1 only applies to i.i.d. samples. Therefore a large negative value of ξn has only one possible interpretation: the data does not resemble an i.i.d. sample."

You propose Y = f(X) = (-1)^X*(1+X/10) as your functional relation, which is measurable if X is discrete and indeed if we let x_n=n, then the limiting value of the estimator will be -1/2 not 1.

However, this is just a particular value of the estimator on a particular sample of (x,y). The theorem is an "almost surely statement", which means that it fails for a set of samples with 0 propbability.

Indeed, if we actually picked a random sample of (X, f(X)) with your f, then independent on the distribution on X, since X is discrete, we would expect to see many ties (infinitely many ties as the number of samples goes to infinity). This would mean that |r_{i+1}-r_i| is 0 for all but finitely many i and so the estimator would be 1.

This also covers the case of f being a hash function as mentioned before. Worse yet it only has finitely many different values so once again infinitely many ties.

The way the estimator is defined, it will take care of the (X, f(X)) case fairly easily as for a typical sample you will get x values that cluster and for a measurable function this implies that the resulting values will be close together and so the rank differences will be small.

This discussion probably wasnt included in the abstract since its fairly simple measure theory which most experts readimg tje article will be intimately familiar with

mellavora 4 years ago

as an alternate to cirpus's reply, and noting that in roenxi's example roenxi is using the R packaged supplied by the paper's author (suggesting a real interest in understanding), I again refer to page 4 of the article.

"it is not very hard to prove that the minimum possible value of [the proposed measure] is −1/2 + O(1/n), and the minimum is attained when the top n/2 values of Yi are placed alternately with the bottom n/2 values. ...Theorem 1.1 only applies to i.i.d. samples. Therefore a large negative value of [the measure] has only one possible interpretation: the data does not resemble an i.i.d. sample."

roenxi, I congratulate your example, it shows that you are working to understand the measure. "where does it break" is always a good question.