Comment by tonymet

4 days ago

I’m an outsider with experience building crawlers. You can get pretty far with residential proxies and browser fingerprint optimization. Most of the b-tier publishers use RBC and heuristics that can be “worked around” with moderate effort.

.. but what about subscription only, paywalled sources?

  • many publisher's offer "first one's free".

    For those that don't , I would guess archive.today is using malware to piggyback off of subscriptions.