Comment by usrusr

1 year ago

Could be a pilot implementation to learn about how to link up external specialist engines. Chess would be the obvious example to start with because the problem is so well known, standardized and specialist engines are easily available. If they ever want to offer an integration like that to customers (who might have some existing rule based engine in house), the need to know everything they can about expected cost, performance.

This doesn't address its terrible performance. If it were touching anything like a real engine it would be playing at a superhuman level, not the level of a upper-tier beginner.

  • The way I read the article is that it's just as terrible as you would expect it to be from pure word association, except for one version that's an outlier in not being terrible at all within a well defined search depth, and again just as terrible beyond that. And only this outlier is the weird thing referenced in the headline.

    I read this as that this outlier version is connecting to an engine, and that this engine happens to get parameterized for a not particularly deep search depth.

    If it's an exercise in integration they don't need to waste cycles on the engine playing awesome - it's enough for validation if the integration result is noticeably less bad than the LLM alone rambling about trying to sound like a chess expert.

    • In this hypothetical, the cycles aren't being wasted on the engine, they're being wasted on running a 200b parameter LLM for longer than necessary in order to play chess badly instead of terribly. An engine playing superhuman chess takes a comparatively irrelevant amount of compute these days.

      If it's fine up to a certain depth it's much more likely that it was trained on an opening book imo.

      What nobody has bothered to try and explain with this crazy theory is why would OpenAI care to do this at enormous expense to themselves?

      1 reply →

  • That would have immediately given away that something must be off. If you want to do this in a subtle way that increases the hype around GPT-3.5 at the time, giving it a good-but-not-too-good rating would be the way to go.