Comment by famouswaffles

9 months ago

Transfer Learning during LLM training tends to be 'broader' than that.

Like how

- Training LLMs on code makes them solve reasoning problems better - Training Language Y alongside X makes them much better at Y than if they were trained on language Y alone and so on.

Probably because well gradient descent is a dumb optimizer and training is more like evolution than a human reading a book.

Also, there is something genuinely weird going on with LLM chess. And it's possible base models are better. https://dynomight.net/more-chess/