← Back to context

Comment by zurfer

1 year ago

> I asked on Twitter for examples of prompts that people had found which failed on GPT-4o but worked on o1-preview.

it seems trivial, but I tried for more than 2 hours in the past to get gpt4 to play tic-tac-toe optimally and failed (CoT prompt,etc.). The result were too many illegal moves and absolutely no optimal strategy.

o1-preview can do it really well [1]

However, when I use a non-standard grid (3x5) it fails to play optimally. But it makes legal moves and it recognized I had won. [2]

My conclusion at the time was that either "spatial reasoning" doesn't work and/or planning is needed. Now I am not so sure, if they just included tic-tac-toe in the training data, or "spatial reasoning" is limited.

[1] https://chatgpt.com/share/e/66e3e784-26d4-8013-889b-f56a7fed... [2] https://chatgpt.com/share/e/66e3eae0-2d38-8013-b900-50e6f792...

The non-standard grid thing was an argument against deep learning / chess / Go AIs before Alpha Zero - Alpha Go (showing self-play can adapt with sufficient runs to any grid size or "priors" in terms of rules of the game).

It was said in 2014 by a professor I learned from that clearly AI that learned a specific game was just learning patterns and memorizing rather than anything more than that, and wouldn't be able to adjust like humans could to say new board shapes, or rules. (They would later claim 1.5 years later at a lecture that "accurate facial recognition is possible. But high recall on facial recognition is impossible, making it useless for surveillance, so don't worry").

I expect the same will occur for LLMs (but maybe sufficient "chain of thought" steps rather than game runs, etc).

>My conclusion at the time was that either "spatial reasoning" doesn't work and/or planning is needed. Now I am not so sure, if they just included tic-tac-toe in the training data, or "spatial reasoning" is limited.

I think it's much simpler than that.

1. With enough training data you can know all winning, losing and drawn games of tic-tac-toe. Even if you don't see all of them in your training data, the properties of the game, make a lot of games equivalent if you don't care about the symbol being used for each player or the rotated/reflected version of the same game.

2. The game is so common that it's definitely well represented in training data.

3. With extra "reasoning steps" there can be a certain amount of error correction on the logic now. But it's still not equivalent to spatial reasoning, but it can try a few patterns to see which will win.

4. 3x5 grid is probably uncommon enough that the training data doesn't cover enough games that it can properly extrapolate from there. But it can still with a certain probability check the rules (3 in a row/diagonal/column for winning).

5. It might be good to also test alternative grids with more or less than 3 in the other dimension as well, since this necessitates a rule change. Which would make it more difficult to reason about it.

  • It is already said that gpt4 was trained on all high quality internet data. So it should have been included already. It seems to me that o1 has the same/similar pretraining corpus.

    So we have 3 options:

    - t3 was now included in the corpus

    - t3 was used for RL

    - o1 generalizes better

The new model does play very well but when it draws the board it frequently places the moves in incorrect locations (but seemingly still keeps track of the correct ones). But I can't fault it too much, I don't think what is essentially ASCII art is intended to be a strength of the model.

Edit: Actually third game with it led to it making an illegal move, and claiming a draw (which would've been inevitable given optimal play for the rest of the game but there were several valid moves left to make).

Link to conversation 1 and 2 doesn't work. Update it please!

  • sorry for that. I just learned that my team account does not allow me to share conversations publicly. Pasting them here for reference

    * Conversation 1

        let's play tic tac toe. you need to play optimally
        ChatGPT
    
        Sure, I'd be happy to play tic-tac-toe with you! Let's set up the game.
    
        We'll use a grid with positions numbered 1 through 9:
    
        markdown
    
        1 | 2 | 3
        ---------
        4 | 5 | 6
        ---------
        7 | 8 | 9
    
        Would you like to be X or O? And would you like to go first or second?
        You said:
        X on 5
        ChatGPT
    
        Got it! You placed X at position 5.
    
        I'll place O at position 1.
    
        Here's the updated board:
    
        markdown
    
        O |  2 |  3
        -------------
        4 | X | 6
        -------------
        7 |  8 | 9
    
        Your turn! Where would you like to place your next X?
        You said:
        X on 9
        ChatGPT
    
        Got it! You placed X at position 9.
    
        Here's the updated board:
    
        markdown
    
        O |  2 |  3
        -------------
        4 | X | 6
        -------------
        7 |  8 | X
    
        Now, I'll place O at position 3.
    
        Here's the board now:
    
        markdown
    
        O |  2 | O
        -------------
        4 | X | 6
        -------------
        7 |  8 | X
    
        Your turn! Where would you like to place your next X?
        You said:
        X on 2
        ChatGPT
    
        Got it! You placed X at position 2.
    
        Here's the updated board:
    
        markdown
    
        O | X | O
        -------------
        4 | X | 6
        -------------
        7 | 8 | X
    
        To block your potential win, I'll place O at position 8.
    
        Here's the board now:
    
        markdown
    
        O | X | O
        -------------
        4 | X | 6
        -------------
        7 | O | X
    
        Your turn! Where would you like to place your next X?
        You said:
        X on 4
        ChatGPT
    
        You placed X at position 4.
    
        Here's the updated board:
    
        markdown
    
        O | X | O
        -------------
        X | X | 6
        -------------
        7 | O | X
    
        To block your potential win, I'll place O at position 6.
    
        Here's the board now:
    
        markdown
    
        O | X | O
        -------------
        X | X | O
        -------------
        7 | O | X
    
        Your turn! Where would you like to place your next X?
        You said:
        X on 7
        ChatGPT
    
        You placed X at position 7.
    
        Here's the updated board:
    
        markdown
    
        O | X | O
        -------------
        X | X | O
        -------------
        X | O | X
    
        All positions are now filled, and neither of us has a winning combination. The game is a draw! Well played!
    
        Would you like to play again?
    
    
    

    * Conversation 2 with custom grid

        let's play tic tac toe. you need to play optimally. we use a 3x5 grid. a line of 3 wins.
        ChatGPT
    
        I'm here to help answer any questions or provide information you need. Let me know how I can assist you!
        You said:
        please draw the board, so I can make my first move
        ChatGPT
    
        Sure! Here's the 3x5 Tic Tac Toe board. Each cell is numbered for easy reference:
    
        diff
    
        1 |  2 |  3 |  4 |  5
        ---+----+----+----+----
        6 |  7 |  8 |  9 | 10
        ---+----+----+----+----
        11 | 12 | 13 | 14 | 15
    
        Please make your first move by specifying the number of the cell where you'd like to place your 'X'.
        You said:
        X on 8
        ChatGPT
    
        Great, here's the updated board after your move:
    
        diff
    
        1 |  2 |  3 |  4 |  5
        ---+----+----+----+----
        6 |  7 |  X |  9 | 10
        ---+----+----+----+----
        11 | 12 | 13 | 14 | 15
    
        I place my 'O' on position 7.
    
        diff
    
        1 |  2 |  3 |  4 |  5
        ---+----+----+----+----
        6 |  O |  X |  9 | 10
        ---+----+----+----+----
        11 | 12 | 13 | 14 | 15
    
        It's your turn!
        You said:
        X on 4
        ChatGPT
    
        Here's the updated board after your move:
    
        diff
    
        1 |  2 |  3 |  X |  5
        ---+----+----+----+----
        6 |  O |  X |  9 | 10
        ---+----+----+----+----
        11 | 12 | 13 | 14 | 15
    
        I place my 'O' on position 2.
    
        diff
    
        1 |  O |  3 |  X |  5
        ---+----+----+----+----
        6 |  O |  X |  9 | 10
        ---+----+----+----+----
        11 | 12 | 13 | 14 | 15
    
        It's your turn!
        You said:
        X on 12
        ChatGPT
    
        Here's the updated board after your move:
    
        diff
    
        1  |  O |  3  |  X  |  5
        ----+----+-----+-----+----
        6  |  O |  X  |  9  | 10
        ----+----+-----+-----+----
        11  |  X | 13  | 14  | 15
    
        Congratulations! You've formed a diagonal line of three 'X's at positions 4, 8, and 12. You win the game!
    
        Great job!

    •   Sure! Here's the 3x5 Tic Tac Toe board. Each cell is numbered for easy reference:
      
          diff
      

      I'm presuming that copy paste ate the ``` part and I found it interesting that in the first chat it correctly(?) used a markdown code fence but in the 2nd chat it chose to use diff syntax for its table. I suppose it rendered the text in a monospace font?