Comment by amelius

1 year ago

To be fair, they say

> Theory 2: GPT-3.5-instruct was trained on more chess games.

If that were the case, pumping big Llama chock full of chess games would produce good results. It didn't.

The only way it could be true is if that model recognized and replayed the answer to the game from memory.

  • Do you have a link to the results from fine-tuning a Llama model on chess? How do they compare to the base models in the article here?