← Back to context

Comment by _as_text

1 year ago

I have now read the paper and it alone is enough to make me seriously consider devoting significant amount of my time to the author's project. Here's why:

In my introductory statistics class, I learned that a independent and identically distributed sample is a sequence of random variables X[1], ..., X[n], all of the same signature Omega -> (usually) R. All of them are pairwise independent, and all of them have the same distribution, e.g. the same density function. Elsewhere in probability I have learned that two random variables that have the same density function are, for all intents and purposes, the same.

For all, really? Let's take X[i] and X[j] from some i.i.d. random sample, i != j. They have the same density, which leads us to write X[i] = X[j]. They are also independent, hence

P(X[i] in A, X[j] in A) = P(X[i] in A)*P(X[j] in A),

but X[i] = X[j], so

P(X[i] in A, X[j] in A) = P(X[i] in A, X[i] in A) = P(X[i] in A), so

P(X[i] in A) in {0, 1}.

This was a real problem for me, and I believe I had worse results in that statistics class than I would have if the concept was introduced properly. It took me a while to work out a solution for this. Of course, you can now see that the strict equality X[i] = X[j] is indefensible, in the sense that in general X[i](omega) != X[j](omega) for some atom omega. If you think about what needs to be true about Omega in order for it to have two different variables, X[i] and X[j]:Omega -> R, that are i.i.d, then it will turn out that you need Omega to be a categorical product of two probability spaces:

Omega = Omega[i] x Omega[j]

and X[i] (resp. X[j]) to be the same variable X composed with projection onto first (resp. second) factor. This definition of "sampling with replacement" is able to withstand all scrutiny.

Of course, just like in Buzzard's example of ring localization, it was all caused by someone being careless about using equality.