← Back to context

Comment by selcuka

1 year ago

> chess is an excellent metric for testing logical thought and internal modeling

Is it, though? Apparently nobody else cared to use it to benchmark LLMs until this article.

1 comment

selcuka

Reply

gs17 1 year ago

People had noticed this exact same discrepancy between 3.5-turbo-instruct and 4 a year ago: https://x.com/GrantSlatton/status/1703913578036904431