Comment by lumost

1 year ago

It could also be as simple as OAI experimenting on different datasets. Perhaps Chess games were included in some GPT-3.5 training runs in order to see if training on chess would improve other tasks. Perhaps afterwards it was determined that yes, LLMs can play chess - but no let's not spend time/compute on this.

3 comments

lumost

Workaccount2 1 year ago

Would be a shame, because chess is an excellent metric for testing logical thought and internal modeling. An LLM that can pick up and unique chess game half way through and play it ideally to completion is clearly doing more than "predicting the next token based on the previous one".

selcuka 1 year ago
> chess is an excellent metric for testing logical thought and internal modeling
Is it, though? Apparently nobody else cared to use it to benchmark LLMs until this article.
- gs17 1 year ago
  
  People had noticed this exact same discrepancy between 3.5-turbo-instruct and 4 a year ago: https://x.com/GrantSlatton/status/1703913578036904431