Comment by lblume
1 day ago
Given that current LLMs do not consistently output total garbage, and can be used as judges in a fairly efficient way, I highly doubt this could even in theory have any impact on the capabilities of future models. Once (a) models are capable enough to distinguish between semi-plausible garbage and possibly relevant text and (b) companies are aware of the problem, I do not think data poisoning will be an issue at all.
Yes, but you still waste their processing power.
> Once (a) models are capable enough to distinguish between semi-plausible garbage and possibly relevant text
https://xkcd.com/810/
There's no evidence that the current global DDoS is related to AI.
We have investigated nobody and found no evidence of malpractice!
The linked page claims that most identified crawlers are related to scraping for training data of LLMs, which seems likely.