Comment by MarkusQ

2 days ago

That's not how information works. Learning more from one outcome than the other decreases the probability of that outcome occurring, so the expected information (which is the sum of the outcome probability times the outcome information for each of the two possible outcomes) is always less than or equal to one.

If all you can get is a "true" or "false" you expect, at most, one bit of information.

Right - but coming back to the original question, if I'm not mistaken, the explanation is that the blogpost is measuring information gained from an actual outcome, as opposed to _expected_ information gain. An example will help:

Say you're trying to guess the number on a 6-sided die that I've rolled. If I wanted to outright tell you the answer, that would be 2.58 bits of information I need to convey. But you're trying to guess it without me telling, so suppose you can ask a yes or no question about the outcome. The maximum of the _expected_ information add is 1 bit. If you ask "was it 4 or greater?", then that is an optimal question, because the expected information gain is min-maxed. That is, the minimum information you can gain is also the maximum: 1 bit. However, suppose you ask "was it a 5?". This is a bad question, because if the answer is no, there are still 5 numbers it could be. Plus, the likelihood of it being 'no' is high: 5/6. However, despite these downsides, it is true that 1/6 times, the answer WILL be yes, and you will gain all 2.58 bits of information in one go. The downside case more than counteracts this and preserves the rules of information theory: the _expected_ information gain is still < 1 bit.

EDIT: D'oh, nevermind. Re-reading the post, it's definitely talking about >1 bit expectations of potential matchings. So I don't know!

It's not a yes/no per contestent, it's per edge between contestants. There are n(n-1)/2 of these.

A true answer for a potential match is actually a state update for all of the (n-1) edges connecting either contestant, that's 2(n-2) edges that can be updated to be false. Some of these may already be known from previous rounds' matchups but that's still more than a single binary.

  • An answer of "yes" will generally eliminate many edges, with potential for >1 bit. However, an answer of "no" will generally eliminate just that one edge, which is generally <1 bit.

  • But you don't receive more than a single binary value; you get a yes or no.

    If both of these are equally likely, you gain one bit of information, the maximum possible amount. If you already have other information about the situation, you might gain _less_ than one bit on average (because it confirms something you already knew, but doesn't provide any new information), but you can't gain more.

    • If I’m trying to guess a 9-letter English word, and test whether the first letter is “x”, there are only the same two answers: Yes/No.

      But “Yes” obviously gives me much more than one bit of the information I need to know the answer.

      7 replies →

I’m not really following. But if you’re told that one of A, B, or C is true; you learn more by being told A is True than if you learn D is True, no?

  • Yes, you learn more than 1 bit in that case. However, if you are told A is false, you still don't know whether B or C is true, so you gain less than 1 bit. Assuming A, B and C all have equal probability, your average/expected information gain is <1 bit.

    If you ask the question "which of A, B, or C is true?" then you're not asking a yes/no question, and it's not surprising that you expect to gain more than 1 bit of information.

    • but that’s all consistent. “Expected” gain is less than 1 for the truth booths and sometimes > 1 for actuals; and is > 1 on expected value of the match ups, which aren’t binary questions.

      1 reply →