Comment by akomtu

13 days ago

It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.

Comparing against stockfish isn't fair. That's comparing against enormous amounts of compute spent experimenting with strategies, training neutral nets, etc.

It will lose so badly there will be no point in the comparison.

Besides you could compare models (and harnesses) directly against eachother.

  • Stockfish is a good reference point, an objective measure of how far the LLM's advanced.

    • It's not. Maybe if you used old versions of stockfish that predate the neural net methods used by current versions, because otherwise you'd be comparing the hand-rolled (by an LLM) position evaluation functions against an NNUE and the results of that are a forgone conclusion; stockfish will stomp it every time.

      Maybe that's the result you want for some sort of rhetorical reason, but it would nonetheless not be an informative test.