← Back to context

Comment by XorNot

8 hours ago

Its weird how people keep saying the rise of AI generated content will "kill AI" as though the companies training models don't have complete archives of all the data they already scraped from the Internet.

It doesn't take all the text of the public Internet for someone to learn to talk, and all these companies are much more in the data curation business for the purposes of teaching models.

Scraping is to make them up to date on current events (and has obvious alternative sources), or the actions of the start up space which don't already have such datasets.

I can't wait to see how the coding performance will start to drop on with newer tools and versions, as people no longer discuss them in the same detail and quantity as they used to. People using LLMs will be stuck in the pre-2023 tools, using new stuff is an uphill battle already (you have to give it the correct docs manually)