← Back to context

Comment by 1vuio0pswjnm7

2 days ago

"An AI is a bag that contains basically all words ever written, at least the ones that could be scraped off the internet or scanned out of a book."

The quantitative and qualitative difference between (a) "all words ever written" and (b) "ones that could be scraped off the internet or scanned out of book" easily exceeds the size of any LLM

Compared to (a), (b) is a tiny pouch, not even a bag

Opinions may differ on whether (b) is a representative sample of (a)

The words "scanned out of a book" would seem to be the most useful IMHO but the AI companies do not have enough words from those sources to produce useful general purpose LLMs

They have to add words "that could be scraped off the internet" which, let's be honest, is mostly garbage