Comment by sitkack

6 months ago

All of the most capable models I use have been clearly trained on the entirety of libgen/z-lib. You know it is the first thing they did, it is like 100TB.

Some of the models are even coy about it.

2 comments

sitkack

zaptrem 6 months ago

The models are not self aware of their training data. They are only aware of what the internet has said about previous models’ training data.

sitkack 6 months ago

I am not straight up asking them. We know the pithy statement about that word.