Comment by alphan0n

19 days ago

https://web.archive.org/web/20240101000000*/https://wiki.dia...

notice how there's a period of almost two months with no new index, just until a week before I posted this? I wonder what might have caused this!!1

(and it's not like they only check robots.txt once a month or so. https://stuff.overengineer.dev/stash/2024-12-30-dfwiki-opena...)

  • :/ Common Crawl archives robots.txt and indicates that the file at wiki.diasporafoundation.org was unchanged in November and December from what it is now. Unchanged from September, in fact.

    https://pastebin.com/VSHMTThJ

    https://index.commoncrawl.org/

    • just for you, I redeployed the old robots.txt (with an additional log-honeypot). I even manually submitted it to the web archive just now so you have something to look at: https://web.archive.org/web/20241231041718/https://wiki.dias...

      they ingested it twice since I deployed it. they still crawl those URLs - and I'm sure they'll continue to do so - as others in that thread have confirmed exactly the same. I'll be traveling for the next couple of days, but I'll check the logs again when I'm back.

      of course, I'll still see accessed from them, as most others in this thread do, too, even if they block them via robots.txt. but of course, that won't stop you from continuing to claim that "I lied". which, fine. you do you. luckily for me, there are enough responses from other people running medium-sized web stuffs with exactly the same observations, so I don't really care.

      7 replies →