Comment by alphan0n

19 days ago

I would take anything the author said with a grain of salt. They straight up lied about the configuration of the robots.txt file.

https://news.ycombinator.com/item?id=42551628

How do you know what the contextual configuration of their robots.txt is/was?

Your accusation was directly addressed by the author in a comment on the original post, IIRC

i find your attitude as expressed here to be problematic in many ways

  • CommonCrawl archives robots.txt

    For convenience, you can view the extracted data here:

    https://pastebin.com/VSHMTThJ

    You are welcome to verify for yourself by searching for “wiki.diasporafoundation.org/robots.txt” in the CommonCrawl index here:

    https://index.commoncrawl.org/

    The index contains a file name that you can append to the CommonCrawl url to download the archive and view.

    More detailed information on downloading archives here:

    https://commoncrawl.org/get-started

    From September to December, the robots.txt at wiki.diasporafoundation.org contained this, and only this:

    >User-agent: * >Disallow: /w/

    Apologies for my attitude, I find defenders of the dishonest in the face of clear evidence even more problematic.