Comment by 1vuio0pswjnm7

19 days ago

Alternative to archive.ph, no Javascript, no CAPTCHAs:

   x=www.washingtonpost.com 
   { 
   printf 'GET /technology/2026/02/07/ai-spending-economy-shortages/ HTTP/1.1\r\n'
   printf 'Host: '$x'\r\n'
   printf 'User-Agent: Chrome/115.0.5790.171 Mobile Safari/537.36 (compatible ; Googlebot/2.1 ; +http://www.google.com/bot.html)\r\n'
   printf 'X-Forwarded-For: 66.249.66.1\r\n\r\n'
   }|busybox ssl_client -n $x $x > 1.htm
   firefox ./1.htm

9 comments

1vuio0pswjnm7

WarOnPrivacy 19 days ago

> Alternative to archive.ph, no Javascript, no CAPTCHAs:

The other tld have been kinder to me (no captcha).

https://archive.md/J8pg5

https://archive.fo/J8pg5

1vuio0pswjnm7 19 days ago

Anonymous middleman that could potentiallly collect browsing histories, serves CAPTCHAS, requires Javascript, maybe in crosshairs of authorities, unreliable according to some commenters (YMMV). NB. Nothing about "honeypot", just observations (cf. "accusations")

Someone recently noticed an apparent DDOS attempt on some blogger using Javascript fetch function

The site used to include a tracking pixel containing the visitor's IP address

Also used to ping mail.ru

Would need to look at the page source again to see what it contains today

It's a crowd favorite

People love it

1vuio0pswjnm7 18 days ago
A "Firefox app" could "access browsing history", "steal cookies", whatever unless (a) the user removes the app's ability to "phone home" from the source code then compiles the app themselves and (b) the user controls the servers to which the app is allowed to connect
For example, Firefox app by default, without any input from the user, tries to make connections to Mozilla servers such as
content-signature-2.cdn.mozilla.net firebaseremoteconfig.googleapis.com firefox.settings.services.allizom.org firefox.settings.services.mozilla.com services.addons.mozilla.org detectportal.firefox.com contile.services.mozilla.com
The opportunities for Mozilla app developers to send data, e.g., browsing history, cookies, usage statistics, crash reports, empty requests (pings), whatever, to Mozilla or to any third party are without limit unless (a) and (b) are addressed
Even uBlock Origin tries to connect to ublockorigin.pages.dev by default, i.e., without any input from the user
The user might want this connection to occur but because it is a default the user might also not even know the connection is being made. These connections are a developer choice not a user choice. The user might agree with the choice
- 1vuio0pswjnm7 17 days ago
  
  Correction: /googlepis.com/d (another third party server, not a Mozilla server)
1vuio0pswjnm7 19 days ago
Third party anonymous middleman that can observe what site(s) a user wants to read
Keywords: anonymous, third party
The websites with the webpages that a user seeks to read, e.g., some page on www.washingtonpost.com, are not third party websites. They are "first party" websites
Other archives are third parties but are generally not run by anonymous operators that keep shifting between different IP addresses and domain names
Other archives generally do not serve CAPTCHAs or require Javascript
Will provide examples if requested
No Anubis:
{ printf 'GET / HTTP/1.0\r\n' printf 'Host: www.kernel.org\r\n\r\n' }|busybox ssl_client 146.75.109.55
NB. Replace 205.1.1.1 with user's IP address, replace cc with country code, replace 123456789 with some 9-digit number
</script></div></div><img style="position:absolute" width="1" height="1" src="https://205.1.1.1.cc.VSY1.123456789.pixel.archive.md/x.gif"><script type="text/javascript">
This is from December 2024. May have changed since then
gruez 19 days ago

>Anonymous middleman that could potentiallly collect browsing histories
So literally any site? What's the alternative, using something like bypass paywalls clean, and allow it to access your browsing history AND steal your cookies?
>serves CAPTCHAS
I don't like it, but it's understandable given the load from AI scrapers. Do you also get upset at kernel.org for putting up Anubis?
>requires Javascript
So most sites?
>The site used to include a tracking pixel containing the visitor's IP address
???
They couldn't get visitor IP through logs?
1vuio0pswjnm7 18 days ago

Is the tracking pixel a hack to get around absence of EDNS subnet
dmix 19 days ago
Are you accusing archive.today of being a honeypot for the feds because they use Cloudflare? That's a bit much don't you think?
- MallocVoidstar 19 days ago
  
  Archive.today don't use Cloudflare, the admin mimics their captcha page because he hates them. He also used to captcha-loop anyone using Cloudflare's DNS resolver because they don't send the IP subnet of clients to upstreams.
  I don't think it's a honeypot, though, it's not like he's learning much about me other than I like not paying for news sites.