Comment by reddalo

1 day ago

Off topic, but I'm always amazed by Archive.md/.is/whatever. To this day I don't understand how they manage to bypass a lot of paywalls.

The mystery about the owner makes it even more intriguing.

I think archive has mostly news, random articles and such.

And as they say nothing is more worthless than yesterday's news.

I assume they just pretend to be the Googlebot so the site just gives the text.

  • Won’t work for any popular site. You can try that easily by using extensions to set the user agent. If you are not checking the public list of IPs that Google publishes for the crawler you are doing it wrong.

Maybe they have a paid account? I don’t think there’s much magic behind

  • Publications could use watermarking to encode the name of the account an article is being served to, but they don't seem to. I wonder why.

Given to how many people its existence must be incredibly infuriating, it's so odd that it's not being chased down with more haste than pirate bay was. I mean I'm glad it's not, but kinda surprised.

  • There has been some dns resolver issues, some DNS resolvers wont return the address to the sites like archive.is or sites like Annas Archive