Archive.today: on the trail of mysterious guerrilla archivists of the Internet

2 years ago (gyrovague.com)

I don't think they appreciate this article, not quite doxxing but publishing the results of a hunt for the person's name and location when it can already be assumed they don't want that known if they have never published it despite being quite high-profile.

The nixos link was edited and removed by the author 3 hours after this submission was posted to HN.

Checking who wrote this blog, the About starts with:

> Jani Patokallio was first bitten by the travel bug at the age of 8 months and hasn't managed to shake it yet. Halfway through racking up 650,000 flight miles

sounds like a nice person (next, they'll tell us how much plastic they bought in a lifetime!), but that aspect aside, I'm not seeing any motive for why archive's personalia should need to be dug into...

archive.today or archive.is - Wikipedia: https://en.wikipedia.org/wiki/Archive.today

Help:Using archive.today - Wikipedia: https://en.wikipedia.org/wiki/Help:Using_archive.today

archive.today - FAQ : https://archive.md/faq

archive.today - wiki : https://wiki.archiveteam.org/index.php/Archive.today

Archive Team wiki : https://wiki.archiveteam.org/

archive.today - Blog : https://blog.archive.today/

Tumblr : https://archive-is.tumblr.com/

Twitter : https://twitter.com/archiveis

  archive.today

  archive.ph

  archive.is

  archive.li

  archive.vn

  archive.fo

  archive.md

  archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion

Launched May 16, 2012; 11 years ago

  • Thank you. I checked them all, all need the CAPTCHA every single time. I think these are all the same, but on different domains.

    Though I am lucky compared to other users, I only need to click a box, and wait ~1.5 secs, no pictures or endless loop.

  • What is the purpose of the mail.ru embedded link on the archive sites?

    • I think that's code for some Google Analytics-like service (from googling for "top-fwz1.mail.ru/counter"), I mean mail.ru is not only email, but a Yandex/Google-like conglomerate of various web services.

I've also been intrigued about the owner of archive.is and have looked in to it a couple of times but what I managed to find is pretty much all the same stuff as mentioned here.

One interesting thing I'd like to mention are these tweets[1] by archive.is when he was supposedly questioned for something at the finnish - russian border and as a result he blocked the entire site in Finland, although later he lifted the block. I also couldn't find any information about the "Russia vs. http://archive.org case" he mentions in the tweet.

1: https://archive.is/Pum1p

> Donations these days are via Liberapay, an obscure French non-profit organization, and YC-backed startup BuyMeACoffee.

I am not sure why Liberapay is qualified as 'obscure'. Their website's "legal" page [0] clearly identifies the organization and its legal representative, while providing contact details. The status of the non-profit organization can be verified in the French government's website [1].

[0]: https://en.liberapay.com/about/legal [1]: https://www.journal-officiel.gouv.fr/pages/associations-deta... - in French

  • Its obscure because it is not well known not because it is shady. I never heard of them before too.

  • I interpreted the use of the word "obscure" as meaning "little-known" rather than "secretive".

    I would expect that only a relatively small proportion of people would have heard of Liberapay, so I think calling them "obscure" is not wrong.

  • It isn't; it's been around for ages and is specifically a trusted organization handling donations for open source projects.

    Chalk the snotty comment, and irrational YC worship, up to your basic herd mentality. The author is, I would bet, a typical HN poster.

The idea that every good project should be a full scale financially stable enterprise with a proper administrative team and dedicated supportive fan club is severely limiting (and is indirectly telling you to know your place in existing chains of power made that specific way). More often than not, services like those are made not by underground kingpins, but by common people who happened to be at the right place in the right time. For example, torrents.ru was once just one of the many regional and global torrent trackers, sometimes run by literal teenagers (albeit that one had the best domain name). Look at it today.

Also, «Маша» (Masha) and «Мойша» (Moishe/Moshe) are completely different names, and I've never ever seen anyone using the former for the latter. Either the author stretches it a bit too much, or the author knows something that should not be publicly revealed in the manner they chose (and the whole post is just an intimidating leak).

Anyway, if the author(s) have successful illegal business, as implied, they shouldn't have any difficulties in acquiring enough spare identities to burn. As a side note, it's quite ironic that “security” is such an idol today that common people need to go out of their way to evade tracking, while even petty internet criminals buy virtual identities in bulk, and have special instrumented browsers to load fake system data with one click.

With archive.today, sometimes IP addresses may stop working. The following ones appear to be still working.

    x=23.137.248.133 # NL
    x=41.77.143.21 # GB
    x=51.38.69.52 # GB
    x=51.79.250.183 # SG
    x=79.133.51.130 # DE
    x=89.253.237.217 # RU
    x=90.156.209.190 # RU
    x=90.156.209.190 # RU
    x=91.193.43.144 # NL
    x=94.140.114.194 # LV
    x=130.0.232.208 # UA
    x=139.99.171.251 # AU
    x=139.99.89.157 # SG
    x=178.17.174.208 # MD
    x=178.250.243.66 # RU
    x=185.101.35.175 # NO
    x=185.125.168.154  # NO
    x=188.143.233.210 # RU
    x=192.124.216.250 # RU
    x=192.210.214.166 # US
    x=193.148.248.205 # NL
    x=193.233.203.196 # MD
    x=217.197.116.88 # RU

To test, something like

   printf 'GET /timemap/example.com HTTP/1.0\r\nhost: archive.is\r\nconnection: close\r\n\r\n'|openssl s_client -connect $x:443 -ign_eof

   echo $x archive.is >> /etc/hosts
   curl -0A "" https://archive.is/timemap/example.com

  • In less than 24h all of these except one does not work. Numerous HN commenters have an affinity for archive.is, dropping links on countless submissions. These do not work for everyone.

Interesting read. I've thought about this for a while.

My woes with the site is that my connection to any of the clearnet domains seem to get black holed, or completely blocked by Cloudflare while using Tor. The onion site works fine for viewing, but to archive pages I need to complete the extremely difficult Cloudflare CAPTCHA.

  • The captcha page looks like cloudflare, but I don't think they're using cloudflare, haha. They use recaptcha (not sure if that's possible with cloudflare), the `server` header doesn't == 'cloudflare', accessing by direct ip gives "hello world" instead of the "Direct IP access not allowed" cloudflare message, /cdn-cgi/trace isn't accessible.

    Not sure why they do that. Is it just because it looks decent, or is it poking fun, maybe because of their issue with 1.1.1.1?

    • >The captcha page looks like cloudflare, but I don't think they're using cloudflare, haha.

      That's amazing, I never bothered to take a look once I saw that page but I did just now, and you're right. Google reCAPTCHA skinned as Cloudflare, hysteric.

>> Github ... account called “volth” ... contributed ... to NixOS

>Volth maintained NixOS Perl subsystem:

https://github.com/NixOS/nixpkgs/commits/master?after=1c72dc...

>> The obvious denispetrov.com ... programmer ... a New Yorker ... end of a 25-year career and the blog dries up entirely in 2011, so it doesn’t match the place or time

>A Perl programmer: http://web.archive.org/web/20050208095206/http://www.denispe...

Archive.is started in 2012, just after retirement, why these do not match?

What also worries me a bit is that Wikipedia started to use them in their references, to archive paywalled references.

Generally I enjoy archive.today very much, but it seems to be a labour of love which can go away any moment (despite its apparent resiliency), rather than something for the ages...

  • Wikipedia does not require that references be free to access. Most books, journals and physical newspapers fall into this category. So editors are perfectly entitled to reference paywalled articles. The fact that you can access some of these through Archive.today is really just a nice bonus. I am a bit concerned about the possibility that the service might just vanish someday, but I don't think that's a reason not to use it.

    • The problem is more that a lot of references will just disappear. It just so happened that earlier today I opened 4 or 5 references for an event that happened around 2009-2012. They all gave either a 404s or just redirected me to the homepage.

      This is why I consider these types of archives important: not so much to bypass paywalls, but to ensure content is still available in a decade, or two decades.

      2 replies →

It's strange to hear the author isn't a fan of cryptocurrency. There are a lot of dubious use-cases for crypto, but facilitating donations for sketchy services is an obvious one.

Why Archive.today

HN Readers, Commentators and Story Submitters are aware of the FAQ; 'Are paywalls ok?

It's ok to post stories from sites with paywalls that have workarounds. : https://news.ycombinator.com/newsfaq.html

Which means for the Publishers of pay-walled sites to be featured on HN,

which is a prime site for garnering potential paying new prospects and to replace natural attrition of subscribers.

Some Publishers see this as a positive.. free samples, a minor amount of the Publications full output,

much like a paid agent in a supermarket giving out cheese and/or spiced meat on a stick etc, to encourage new users.

Other Publishers see this as mice nibbling at their cheese.

I use archive.is because almost of the Articles submitted to HN are already archived, so it is a simple copy and past.

archive.is does Not require java to read or to archive an Article,

it is fast and the archiving scripts work on most sites.

Interesting read!

For a while now, I've had infrequently occurring arcane cert/SSL issues connecting to archive.ph and its siblings, but trying a couple of links from the article I find I can't get past an endless cycle of "one more step" captcha protection - tried clearing all cookies and revisiting an old url, but to no avail.