← Back to context

Comment by Filligree

1 month ago

We’re nowhere near ingesting the whole internet.

Though personally, I think we’re missing whatever architecture / mathematical breakthrough will make online learning (or even offline incremental, I.e. dreams) work.

At that point we could give the AI a robot body and train it of lived experience.

> "We’re nowhere near ingesting the whole internet."

We don't need to ingest the whole internet. I'd wager that upwards of 75% of the internet is spam, which would be useless for LLM training purposes. By the way, spam and useless information on the internet is only going to get worse, largely thanks to LLMs.

Only a subset of the internet contains "useful" information, an even a smaller subset contains information which is "clean enough" to be used for training purposes, and an even smaller subset can be legally scraped and used for training purposes.

It's highly likely that we've reached "peak training data" a long time ago, for many areas of knowledge and activities which are available on the internet.