← Back to context

Comment by chvid

1 year ago

Theory 5: GPT-3.5-instruct plays chess by calling a traditional chess engine.

Just think about the trade off from OpenAI's side here - they're going to add a bunch of complexity to gpt3.5 to let it call out to engines (either an external system monitoring all outputs for chess related stuff, or some kind of tool-assisted CoT for instance) just so it can play chess incorrectly a high percentage of the time, and even when it doesn't at a mere 1800ELO level? In return for some mentions in a few relatively obscure blog posts? Doesn't make any sense to me as an explanation.

  • But there could be a simple explanation. For example, they could have tested many "engines" when developing function calling and they just left them in there. They just happened to connect to a basic chess playing algorithm and nothing sophisticated.

    Also, it makes a lot of sense if you expect people to play chess against the LLM, especially if you are later training future models on the chats.

    • This still requires a lot of coincidences, like they chose to use a terrible chess engine for their external tool (why?), they left it on in the background for all calls via all APIs for only gpt-3.5-turbo-instruct (why?), they see business value in this specific model being good at chess vs other things (why?).

      You say it makes sense but how does it make sense for OpenAI to add overhead to all of its API calls for the super niche case of people playing 1800 ELO chess/chat bots? (that often play illegal moves, you can go try it yourself)

  • Could be a pilot implementation to learn about how to link up external specialist engines. Chess would be the obvious example to start with because the problem is so well known, standardized and specialist engines are easily available. If they ever want to offer an integration like that to customers (who might have some existing rule based engine in house), the need to know everything they can about expected cost, performance.

    • This doesn't address its terrible performance. If it were touching anything like a real engine it would be playing at a superhuman level, not the level of a upper-tier beginner.

      13 replies →

Sorry this is just consiracy theorizing. I've tried jt with GPT-3.5-instruct myself in the OpenAI playeground where the model clearly does nothing but auto-regression. No function calling there whatsoever.

Occam’s razor. I could build a good chess playing wrapper around OpenAPI (any version) that would consult a chess engine when presented with any board scenario, and introduce some randomness so that it doesn’t play too well.

I can’t imagine any programmer in this thread would be entertaining a more complicated scenario than this. You can substitute chess for any formal system that has a reliable oracle.

Yes! I also was waiting for this seemingly obvious answer in the article as well. Hopefully the author will see these comments.

I have this hypothesis as well, that OpenAI added a lot of „classic“ algorithms and rules over time, (eg rules for filtering etc)