Comment by gary_0

1 day ago

The Marginalia search engine or archive.org probably don't deserve such treatment--they're performing a public service that benefits everyone, for free. And it's generally not in one's best interests to serve a bunch of garbage to Google or Bing's crawlers, either.

2 comments

gary_0

marginalia_nu 17 hours ago

It's not really too big of a problem for a well-implemented crawler. You basically need to define an upper bound both in terms of document count and time for your crawls, since crawler traps are pretty common and have been around since the cretaceous.

darkwater 1 day ago

If you have such a website, then you will just serve normal data. But it seems perfectly legit to serve fake random gibberish from your website if you want to. A human would just stop reading it.