Comment by _ikke_

6 months ago

It affects many open source projects as well, they just scrape everything repeatedly without abandon.

First from known networks, then from residential IPs. First with dumb http clients, now with full blown headless chrome browsers.

1 comment

_ikke_

bananalychee 6 months ago

Well I can parse my nginx logs and don't see that happening, so I'm not convinced. I suppose my websites aren't the most discoverable, but the number of bogus connections sshd rejects is an order of magnitude or three higher than the number of unknown connections I get to my web server. Today I received requests from two whole clients from US data centers, so scrapers must be far more selective than you claim, or they are nowhere near the indie web killer OP purports them to be.

I've worked with a company that has had to invest in scraper traffic mitigation, so I'm not disputing that it happens in high enough volume to be problematic for content aggregators, but as for small independent non-commercial websites I'll stick with my original hypothesis unless I come across contradictory evidence.