Comment by reddalo
21 hours ago
Off topic, but I'm always amazed by Archive.md/.is/whatever. To this day I don't understand how they manage to bypass a lot of paywalls.
The mystery about the owner makes it even more intriguing.
21 hours ago
Off topic, but I'm always amazed by Archive.md/.is/whatever. To this day I don't understand how they manage to bypass a lot of paywalls.
The mystery about the owner makes it even more intriguing.
I think archive has mostly news, random articles and such.
And as they say nothing is more worthless than yesterday's news.
I assume they just pretend to be the Googlebot so the site just gives the text.
Won’t work for any popular site. You can try that easily by using extensions to set the user agent. If you are not checking the public list of IPs that Google publishes for the crawler you are doing it wrong.
Maybe they have a paid account? I don’t think there’s much magic behind
Publications could use watermarking to encode the name of the account an article is being served to, but they don't seem to. I wonder why.
I just assumed they copied it into their own db
thetimes.com has a paywall if you visit it from the UK, and full content if you are in the US.
entonces, US-based archive.org "bypasses" this paywall as well:
https://web.archive.org/web/https://www.thetimes.com/culture...
Given to how many people its existence must be incredibly infuriating, it's so odd that it's not being chased down with more haste than pirate bay was. I mean I'm glad it's not, but kinda surprised.
There has been some dns resolver issues, some DNS resolvers wont return the address to the sites like archive.is or sites like Annas Archive
The music or movie industry lobby is much more aggressive I’d assume.