← Back to context

Comment by rpigab

2 days ago

> Mathematicians often visualize this problem in terms of spheres. You can think of each code word as a high-dimensional point at the center of a sphere. If an error-filled message (when represented as a high-dimensional point) lives inside a given sphere, you know that the code word at the sphere’s center was the intended message. You don’t want these spheres to overlap — otherwise, a received message might be interpreted in more than one way. But the spheres shouldn’t be too far apart, either. Packing the spheres tightly means you can communicate more efficiently.

I went to math prep school for 2 years, attended 12 hours of math class in agebra and analysis per week, which I think proves I've done more math than most people in the general population, and this makes no sense to me. It either lacks introduction required to understand the analogy, or I've become really dumb. I want to understand this based on what the article says, but I can't. I can't represent error-filled messages as high-dimensional points. It's easier for me to imagine what the intersection between 4D spheres would look like in geometry.

I found this for anyone interested in understanding 4D spheres without knowing too much math: https://baileysnyder.com/interactive-4d/4d-spheres/

I'm not sure if this helps makes things clearer, but see this diagram for symbols in Quadrature Amplitude Modulation [1]. The valid symbols are mapped to certain points in the vector space. Now, imagine non-overlapping circles around each symbol. If a received signal falls within a circle, it would be mapped to that symbol in the centre of the circle.

This can be extended to 3-D or higher dimension spaces.

[1] https://en.wikipedia.org/wiki/Quadrature_amplitude_modulatio...

> I want to understand this based on what the article says, but I can't. I can't represent error-filled messages as high-dimensional points.

Well, start with an analogy. Let's say you and I want to communicate a message, which comes from a set of let's say 4 possible messages: "YES", "NO", "GOOD", and "BYE". Let's further suppose that the medium for this message (the "data channel") is going to be a single point selected from a 2D square. We'll agree beforehand on four points in the square that will represent our four possible messages. Then, you're going to position a dot at one of those points, and I'm going to observe that dot and infer your message from its position.

If the "data channel" is "error-free" (a.k.a. "lossless"), then it really doesn't matter which points we agree on: you could say that the exact center of the square is "YES", the point one millimeter to the left is "NO", the point two millimeters to the left is "GOOD", and so on. But if the data channel is "lossy," then the dot might get shaken around before I observe it. Or equivalently, I might observe its position slightly incorrectly. So we should choose our "code" so as to minimize the effect of this "error."

The best way to do that, on a square, is to place our four "code points" all the way at the four corners of the square, as far away from each other as possible. By "as far away from each other as possible," I mean in the sense of https://en.wikipedia.org/wiki/Pole_of_inaccessibility — I mean we want to maximize the minimum distance between any two points. A mathematician would notice that this is the same thing as maximizing the radius R such that we can draw a circle of radius R around each of our code points without any of the circles intersecting. (R in this case is half of the square's side length.)

If we add a fifth code point, this same reasoning would lead us to place that fifth point right smack in the center of the square. And the sixth point... well, I feel like that gets tricky.

BUT! In actual communications, we don't send messages encoded as real points in 2D squares. We send messages as discrete bit-strings, i.e., strings of zeros-and-ones of length N, which you can see as discrete points at the corners of an N-dimensional hypercube. Then, if we want to send K different messages robust against errors(+), we should pick as our code points some K corners of the hypercube so as to maximize the minimum Manhattan distance along the hypercube's edges between any two code points. This is the basic idea behind error-correcting codes.

A digital error-correcting code is "K code points in a bounded region of N-dimensional hyperspace (namely the discrete set of corners of a unit hypercube), selected so as to maximize the minimum distance between any two of them." The kissing-hyperspheres problem is "K sphere-centers in a bounded region of N-dimensional hyperspace (namely the continuous set of points at unit distance from the origin), selected so as to maximize the minimum distance between any two of them (and then, if that minimum distance is still >=1, increase K and try again)."

If all you meant is "Those two problem statements don't seem 100% equivalent," I think I agree with you. But if you meant you didn't see the similarity at all... well, I hope this helped.

https://en.wikipedia.org/wiki/Pole_of_inaccessibility

https://en.wikipedia.org/wiki/Error_correction_code

(+) — edited to add: Robust against the traditional model of error, i.e., our "threat model" is that any given bit has a constant small probability of getting flipped, so that our observed point may be some random Manhattan walk away from the code point you actually sent. You could instead use a different threat model — e.g. supposing that the bits sent in the actual digital message's "low-order" bits would flip more often than the high-order bits — in which case the optimal selection of code points wouldn't be as simple as "just maximize Manhattan distance."

  • This helped a lot, thanks! I now see a similiarity where I was missing the bridges between geometry and lossy information channels. It's really interesting, though it's a really complex problem.