Comment by elzbardico

1 day ago

> which is, of course, ridiculous.

Why? in the world of web scrapping this is pretty common.

6 comments

elzbardico

Because it works too reliably. Imagine what that would entail. Managing thousands of accounts. You would need to ensure to strip the account details form archived peages perfectly. Every time the website changes its code even slightly you are at risk of losing one of your accounts. It would constantly break and would be an absolute nightmare to maintain. I've personally never encountered such a failure on a paywalled news article. archive.today managed to give me a non-paywalled clean version every single time.

Maybe they use accounts for some special sites. But there is definetly some automated generic magic happening that manages to bypass paywalls of news outlets. Probably something Googlebot related, because those websites usually give Google their news pages without a paywall, probably for SEO reasons.

mikkupikku 21 hours ago
Using two or more accounts could help you automatically strip account details.
- xurukefi 21 hours ago
  
  That's actually a really neat idea.
behringer 18 hours ago

Replace any identifiers like usernames and emails with another string automatically.
permo-w 11 hours ago
I could be wrong, but I think I've seen it fail on more obscure sites. But yeah it seems unlikely they're maintaining so many premium accounts. On the other hand they could simply be state-backed. Let's say there are 1000 likely paywalled sites, 20 accounts for each = 20k accounts, $10/month => $200k/month = $2.4m a year. If I were an intelligence agency I'd happily drop that plus costs to own half the archived content on the internet.
Surely it wouldn't be too hard to test. Just set up an unlisted dummy paywall site, archive it a few times and see what the requests looks like.
- Jordan-117 2 hours ago
  
  Interesting theory. It would also be a good way to subtly undermine the viability of news outlets, not to mention the insidious potential of altering snapshots at will. OTOH, I'd expect a state-sponsored effort to be more professional in terms of not threatening and smearing some blogger who questioned them.