← Back to context

Comment by carlmr

1 year ago

>My conclusion at the time was that either "spatial reasoning" doesn't work and/or planning is needed. Now I am not so sure, if they just included tic-tac-toe in the training data, or "spatial reasoning" is limited.

I think it's much simpler than that.

1. With enough training data you can know all winning, losing and drawn games of tic-tac-toe. Even if you don't see all of them in your training data, the properties of the game, make a lot of games equivalent if you don't care about the symbol being used for each player or the rotated/reflected version of the same game.

2. The game is so common that it's definitely well represented in training data.

3. With extra "reasoning steps" there can be a certain amount of error correction on the logic now. But it's still not equivalent to spatial reasoning, but it can try a few patterns to see which will win.

4. 3x5 grid is probably uncommon enough that the training data doesn't cover enough games that it can properly extrapolate from there. But it can still with a certain probability check the rules (3 in a row/diagonal/column for winning).

5. It might be good to also test alternative grids with more or less than 3 in the other dimension as well, since this necessitates a rule change. Which would make it more difficult to reason about it.

It is already said that gpt4 was trained on all high quality internet data. So it should have been included already. It seems to me that o1 has the same/similar pretraining corpus.

So we have 3 options:

- t3 was now included in the corpus

- t3 was used for RL

- o1 generalizes better