Comment by loeg

19 days ago

I'd kind of like to see that claim substantiated a little more. Is it all crawlers that switch to a non-bot UA, or how are they determining it's the same bot? What non-bot UA do they claim?

> Is it all crawlers that switch to a non-bot UA

I've observed only one of them do this with high confidence.

> how are they determining it's the same bot?

it's fairly easy to determine that it's the same bot, because as soon as I blocked the "official" one, a bunch of AWS IPs started crawling the same URL patterns - in this case, mediawiki's diff view (`/wiki/index.php?title=[page]&diff=[new-id]&oldid=[old-id]`), that absolutely no bot ever crawled before.

> What non-bot UA do they claim?

Latest Chrome on Windows.

Presumably they switch UA to Mozilla/something but tell on themselves by still using the same IP range or ASN. Unfortunately this has become common practice for feed readers as well.

I would take anything the author said with a grain of salt. They straight up lied about the configuration of the robots.txt file.

https://news.ycombinator.com/item?id=42551628

  • How do you know what the contextual configuration of their robots.txt is/was?

    Your accusation was directly addressed by the author in a comment on the original post, IIRC

    i find your attitude as expressed here to be problematic in many ways

    • CommonCrawl archives robots.txt

      For convenience, you can view the extracted data here:

      https://pastebin.com/VSHMTThJ

      You are welcome to verify for yourself by searching for “wiki.diasporafoundation.org/robots.txt” in the CommonCrawl index here:

      https://index.commoncrawl.org/

      The index contains a file name that you can append to the CommonCrawl url to download the archive and view.

      More detailed information on downloading archives here:

      https://commoncrawl.org/get-started

      From September to December, the robots.txt at wiki.diasporafoundation.org contained this, and only this:

      >User-agent: * >Disallow: /w/

      Apologies for my attitude, I find defenders of the dishonest in the face of clear evidence even more problematic.

      3 replies →