Comment by SchemaLoad
12 hours ago
It's because the AI overview is most of the time directly summarising the search results rather than synthesizing an answer from internal model knowledge. Which is why it can hyperlink the sources for the facts now. Even a very dumb lightweight model can extract relevant text from articles
I just can't see how this is sustainable since they are stealing from the sources who are now getting defunded.
> I just can't see how this is sustainable since they are stealing from the sources who are now getting defunded.
Yeah, that's why I said I don't know where the internet is heading to.
You can see the fall in real time - half the sources are also dubious AI slop now and that number’s only growing :-/
At work the conversation is that simultaneously everyone is using LLMs now, yet we receive virtually no traffic through them. The LLMs scrape our data, provide an answer to the user, and we see nothing from it.
I have the same worry about LLMs in general - I know that ‘model collapse’ seems to be an unfashionable idea, but when the internet’s just full of garbage (soon?…), what are we going to train these things on?
1 reply →
How often are they scraping?
Also generally wondering… Do labs view scraping as legally safer than trying to cache the Internet? I figure it’s easy to mark certain content as all but evergreen (can do a quick secondary check for possible new news).
Maybe caching everything is too expensive?