Comment by akomtu
13 days ago
It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.
13 days ago
It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.
My back-of-the-envelope guess would be that 99% of LLMs given the task to build a chess engine would probably just end up implementing a flavor of negamax and calling it a day.
https://en.wikipedia.org/wiki/Negamax
Comparing against stockfish isn't fair. That's comparing against enormous amounts of compute spent experimenting with strategies, training neutral nets, etc.
It will lose so badly there will be no point in the comparison.
Besides you could compare models (and harnesses) directly against eachother.
Stockfish is a good reference point, an objective measure of how far the LLM's advanced.
It's not. Maybe if you used old versions of stockfish that predate the neural net methods used by current versions, because otherwise you'd be comparing the hand-rolled (by an LLM) position evaluation functions against an NNUE and the results of that are a forgone conclusion; stockfish will stomp it every time.
Maybe that's the result you want for some sort of rhetorical reason, but it would nonetheless not be an informative test.
oh that is super interesting. ty for the idea!