Comment by tedsanders

1 year ago

Eh, OpenAI really isn't as shady as hell, from what I've seen on the inside for 3 years. Rubik's cube hand was before me, but in my time here I haven't seen anything I'd call shady (though obviously the non-disparagement clauses were a misstep that's now been fixed). Most people are genuinely trying to build cool things and do right by our customers. I've never seen anyone try to cheat on evals or cheat customers, and we take our commitments on data privacy seriously.

I was one of the first people to play chess against the base GPT-4 model, and it blew my mind by how well it played. What many people don't realize is that chess performance is extremely sensitive to prompting. The reason gpt-3.5-turbo-instruct does so well is that it can be prompted to complete PGNs. All the other models use the chat format. This explains pretty much everything in the blog post. If you fine-tune a chat model, you can pretty easily recover the performance seen in 3.5-turbo-instruct.