Comment by reddalo

1 day ago

Off topic, but I'm always amazed by Archive.md/.is/whatever. To this day I don't understand how they manage to bypass a lot of paywalls.

The mystery about the owner makes it even more intriguing.

10 comments

reddalo

LordHeini 13 hours ago

I think archive has mostly news, random articles and such.

And as they say nothing is more worthless than yesterday's news.

amouat 21 hours ago

I assume they just pretend to be the Googlebot so the site just gives the text.

dewey 19 hours ago

Won’t work for any popular site. You can try that easily by using extensions to set the user agent. If you are not checking the public list of IPs that Google publishes for the crawler you are doing it wrong.

silcoon 19 hours ago

Maybe they have a paid account? I don’t think there’s much magic behind

blast 10 hours ago

Publications could use watermarking to encode the name of the account an article is being served to, but they don't seem to. I wonder why.

jama211 1 day ago

I just assumed they copied it into their own db

ventegus 11 hours ago

thetimes.com has a paywall if you visit it from the UK, and full content if you are in the US.

entonces, US-based archive.org "bypasses" this paywall as well:

https://web.archive.org/web/https://www.thetimes.com/culture...

moffkalast 21 hours ago

Given to how many people its existence must be incredibly infuriating, it's so odd that it's not being chased down with more haste than pirate bay was. I mean I'm glad it's not, but kinda surprised.

nosafemode 15 hours ago

There has been some dns resolver issues, some DNS resolvers wont return the address to the sites like archive.is or sites like Annas Archive
dewey 19 hours ago

The music or movie industry lobby is much more aggressive I’d assume.