Comment by 9dev

1 year ago

That would have immediately given away that something must be off. If you want to do this in a subtle way that increases the hype around GPT-3.5 at the time, giving it a good-but-not-too-good rating would be the way to go.

9 comments

9dev

bubblyworld 1 year ago

If you want to keep adding conditions to an already-complex theory, you'll need an equally complex set of observations to justify it.

samatman 1 year ago
You're the one imposing an additional criterion, that OpenAI must have chosen the highest setting on a chess engine, and demanding that this additional criterion be used to explain the facts.
I agree with GP that if a 'fine tuning' of GPT 3.5 came out the gate playing at top Stockfish level, people would have been extremely suspicious of that. So in my accounting of the unknowns here, the fact that it doesn't play at the top level provides no additional information with which to resolve the question.
- bubblyworld 1 year ago
  
  That's not an additional criterion, it's simply the most likely version of this hypothetical - a superhuman engine is much easier to integrate than an 1800 elo engine that makes invalid moves, for the simple reason that the vast majority of chess engines play at >1800 elo out of the box and don't make invalid moves ever (they are way past that level on a log-scale, actually).
  This doesn't require the "highest" settings, it requires any settings whatsoever.
  But anyway to spell out some of the huge list of unjustified conditions here:
  1. OpenAI spent a lot of time and money R&Ding chess into 3.5-turbo-instruct via external call.
  2. They used a terrible chess engine for some reason.
  3. They did this deliberately because they didn't want to get "caught" for some reason.
  4. They removed this functionality in all other versions of gpt for some reason ...etc
  Much simpler theory:
  1. They used more chess data training that model.
  (there are other competing much simpler theories too)
  
  2 replies →