Comment by selcuka
1 year ago
> chess is an excellent metric for testing logical thought and internal modeling
Is it, though? Apparently nobody else cared to use it to benchmark LLMs until this article.
1 year ago
> chess is an excellent metric for testing logical thought and internal modeling
Is it, though? Apparently nobody else cared to use it to benchmark LLMs until this article.
People had noticed this exact same discrepancy between 3.5-turbo-instruct and 4 a year ago: https://x.com/GrantSlatton/status/1703913578036904431