Comment by lblume

1 day ago

Given that current LLMs do not consistently output total garbage, and can be used as judges in a fairly efficient way, I highly doubt this could even in theory have any impact on the capabilities of future models. Once (a) models are capable enough to distinguish between semi-plausible garbage and possibly relevant text and (b) companies are aware of the problem, I do not think data poisoning will be an issue at all.

5 comments

lblume

jesprenj 1 day ago

Yes, but you still waste their processing power.

Zecc 18 hours ago

> Once (a) models are capable enough to distinguish between semi-plausible garbage and possibly relevant text

https://xkcd.com/810/

immibis 1 day ago

There's no evidence that the current global DDoS is related to AI.

ykonstant 1 day ago

We have investigated nobody and found no evidence of malpractice!
lblume 1 day ago

The linked page claims that most identified crawlers are related to scraping for training data of LLMs, which seems likely.