Comment by akomtu

13 days ago

It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.

5 comments

akomtu

vunderba 13 days ago

My back-of-the-envelope guess would be that 99% of LLMs given the task to build a chess engine would probably just end up implementing a flavor of negamax and calling it a day.

https://en.wikipedia.org/wiki/Negamax

gpm 13 days ago

Comparing against stockfish isn't fair. That's comparing against enormous amounts of compute spent experimenting with strategies, training neutral nets, etc.

It will lose so badly there will be no point in the comparison.

Besides you could compare models (and harnesses) directly against eachother.

akomtu 13 days ago
Stockfish is a good reference point, an objective measure of how far the LLM's advanced.
- mikkupikku 13 days ago
  
  It's not. Maybe if you used old versions of stockfish that predate the neural net methods used by current versions, because otherwise you'd be comparing the hand-rolled (by an LLM) position evaluation functions against an NNUE and the results of that are a forgone conclusion; stockfish will stomp it every time.
  Maybe that's the result you want for some sort of rhetorical reason, but it would nonetheless not be an informative test.

ykhli 13 days ago

oh that is super interesting. ty for the idea!