Comment by sitkack

6 months ago

All of the most capable models I use have been clearly trained on the entirety of libgen/z-lib. You know it is the first thing they did, it is like 100TB.

Some of the models are even coy about it.

The models are not self aware of their training data. They are only aware of what the internet has said about previous models’ training data.