← Back to context

Comment by 1vuio0pswjnm7

19 days ago

Alternative to archive.ph, no Javascript, no CAPTCHAs:

   x=www.washingtonpost.com 
   { 
   printf 'GET /technology/2026/02/07/ai-spending-economy-shortages/ HTTP/1.1\r\n'
   printf 'Host: '$x'\r\n'
   printf 'User-Agent: Chrome/115.0.5790.171 Mobile Safari/537.36 (compatible ; Googlebot/2.1 ; +http://www.google.com/bot.html)\r\n'
   printf 'X-Forwarded-For: 66.249.66.1\r\n\r\n'
   }|busybox ssl_client -n $x $x > 1.htm
   firefox ./1.htm

Anonymous middleman that could potentiallly collect browsing histories, serves CAPTCHAS, requires Javascript, maybe in crosshairs of authorities, unreliable according to some commenters (YMMV). NB. Nothing about "honeypot", just observations (cf. "accusations")

Someone recently noticed an apparent DDOS attempt on some blogger using Javascript fetch function

The site used to include a tracking pixel containing the visitor's IP address

Also used to ping mail.ru

Would need to look at the page source again to see what it contains today

It's a crowd favorite

People love it

  • A "Firefox app" could "access browsing history", "steal cookies", whatever unless (a) the user removes the app's ability to "phone home" from the source code then compiles the app themselves and (b) the user controls the servers to which the app is allowed to connect

    For example, Firefox app by default, without any input from the user, tries to make connections to Mozilla servers such as

       content-signature-2.cdn.mozilla.net
       firebaseremoteconfig.googleapis.com
       firefox.settings.services.allizom.org
       firefox.settings.services.mozilla.com
       services.addons.mozilla.org
       detectportal.firefox.com
       contile.services.mozilla.com
    

    The opportunities for Mozilla app developers to send data, e.g., browsing history, cookies, usage statistics, crash reports, empty requests (pings), whatever, to Mozilla or to any third party are without limit unless (a) and (b) are addressed

    Even uBlock Origin tries to connect to ublockorigin.pages.dev by default, i.e., without any input from the user

    The user might want this connection to occur but because it is a default the user might also not even know the connection is being made. These connections are a developer choice not a user choice. The user might agree with the choice

  • Third party anonymous middleman that can observe what site(s) a user wants to read

    Keywords: anonymous, third party

    The websites with the webpages that a user seeks to read, e.g., some page on www.washingtonpost.com, are not third party websites. They are "first party" websites

    Other archives are third parties but are generally not run by anonymous operators that keep shifting between different IP addresses and domain names

    Other archives generally do not serve CAPTCHAs or require Javascript

    Will provide examples if requested

    No Anubis:

       {
       printf 'GET / HTTP/1.0\r\n'
       printf 'Host: www.kernel.org\r\n\r\n'
       }|busybox ssl_client 146.75.109.55
    
    

    NB. Replace 205.1.1.1 with user's IP address, replace cc with country code, replace 123456789 with some 9-digit number

    </script></div></div><img style="position:absolute" width="1" height="1" src="https://205.1.1.1.cc.VSY1.123456789.pixel.archive.md/x.gif"><script type="text/javascript">

    This is from December 2024. May have changed since then

  • >Anonymous middleman that could potentiallly collect browsing histories

    So literally any site? What's the alternative, using something like bypass paywalls clean, and allow it to access your browsing history AND steal your cookies?

    >serves CAPTCHAS

    I don't like it, but it's understandable given the load from AI scrapers. Do you also get upset at kernel.org for putting up Anubis?

    >requires Javascript

    So most sites?

    >The site used to include a tracking pixel containing the visitor's IP address

    ???

    They couldn't get visitor IP through logs?

  • Are you accusing archive.today of being a honeypot for the feds because they use Cloudflare? That's a bit much don't you think?

    • Archive.today don't use Cloudflare, the admin mimics their captcha page because he hates them. He also used to captcha-loop anyone using Cloudflare's DNS resolver because they don't send the IP subnet of clients to upstreams.

      I don't think it's a honeypot, though, it's not like he's learning much about me other than I like not paying for news sites.