← Back to context

Comment by gs17

1 year ago

> For the OpenAI models he generated up to 10 different outputs until he got one that was legal, or just randomly chose a move if it failed.

I wonder how often they failed to generate a move. That feels like it could be a meaningful difference.

2 comments

gs17

Reply

famouswaffles 1 year ago

Gpt-3.5-turbo-instruct had something like 5(or less) illegal moves in 8205

https://github.com/adamkarvonen/chess_gpt_eval

I expect the rest to be much worse if 4's performance is any indication

gs17 1 year ago

And the most notable part of that:
> Most of gpt-4's losses were due to illegal moves
3.5-turbo-instruct definitely has some better chess skills.