Archive.today: on the trail of mysterious guerrilla archivists of the Internet

3 years ago (gyrovague.com)

65 comments

resolutebat

I don't think they appreciate this article, not quite doxxing but publishing the results of a hunt for the person's name and location when it can already be assumed they don't want that known if they have never published it despite being quite high-profile.

The nixos link was edited and removed by the author 3 hours after this submission was posted to HN.

Checking who wrote this blog, the About starts with:

> Jani Patokallio was first bitten by the travel bug at the age of 8 months and hasn't managed to shake it yet. Halfway through racking up 650,000 flight miles

sounds like a nice person (next, they'll tell us how much plastic they bought in a lifetime!), but that aspect aside, I'm not seeing any motive for why archive's personalia should need to be dug into...

archo 3 years ago

archive.today or archive.is - Wikipedia: https://en.wikipedia.org/wiki/Archive.today

Help:Using archive.today - Wikipedia: https://en.wikipedia.org/wiki/Help:Using_archive.today

archive.today - FAQ : https://archive.md/faq

archive.today - wiki : https://wiki.archiveteam.org/index.php/Archive.today

Archive Team wiki : https://wiki.archiveteam.org/

archive.today - Blog : https://blog.archive.today/

Tumblr : https://archive-is.tumblr.com/

Twitter : https://twitter.com/archiveis

  archive.today

  archive.ph

  archive.is

  archive.li

  archive.vn

  archive.fo

  archive.md

  archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion

Launched May 16, 2012; 11 years ago

bmacho 3 years ago
Thank you. I checked them all, all need the CAPTCHA every single time. I think these are all the same, but on different domains.
Though I am lucky compared to other users, I only need to click a box, and wait ~1.5 secs, no pictures or endless loop.
- abwizz 3 years ago
  
  try switching your dns resolver, as mentioned in thread
- jrochkind1 3 years ago
  
  I have an endless loop of captchas, it never lets me in!
  
  1 reply →
LinuxBender 3 years ago
What is the purpose of the mail.ru embedded link on the archive sites?
- r721 3 years ago
  
  I think that's code for some Google Analytics-like service (from googling for "top-fwz1.mail.ru/counter"), I mean mail.ru is not only email, but a Yandex/Google-like conglomerate of various web services.

Stagnant 3 years ago

I've also been intrigued about the owner of archive.is and have looked in to it a couple of times but what I managed to find is pretty much all the same stuff as mentioned here.

One interesting thing I'd like to mention are these tweets[1] by archive.is when he was supposedly questioned for something at the finnish - russian border and as a result he blocked the entire site in Finland, although later he lifted the block. I also couldn't find any information about the "Russia vs. http://archive.org case" he mentions in the tweet.

1: https://archive.is/Pum1p

ogurechny 3 years ago

Well, formally, there is a prosecutor office or court decision behind every new block of Russian Censorship Agency, which then merely implements it (and demands the services to remove information or get blocked), but even they stopped pretending they are not in control of rubber-stamping the papers.
https://reestr.rublacklist.net/en/?page=12&q=archive.org
You can see the fine collection of everything from The Anarchist Cookbook to le ironic nasheed remixes, and from exposures of Astral Jews to Alex Jones there:
https://archive.org/details/geo_restricted?tab=collection

me_bx 3 years ago

> Donations these days are via Liberapay, an obscure French non-profit organization, and YC-backed startup BuyMeACoffee.

I am not sure why Liberapay is qualified as 'obscure'. Their website's "legal" page [0] clearly identifies the organization and its legal representative, while providing contact details. The status of the non-profit organization can be verified in the French government's website [1].

[0]: https://en.liberapay.com/about/legal [1]: https://www.journal-officiel.gouv.fr/pages/associations-deta... - in French

Lacerda69 3 years ago

Its obscure because it is not well known not because it is shady. I never heard of them before too.
ldjb 3 years ago

I interpreted the use of the word "obscure" as meaning "little-known" rather than "secretive".
I would expect that only a relatively small proportion of people would have heard of Liberapay, so I think calling them "obscure" is not wrong.
the_biot 3 years ago

It isn't; it's been around for ages and is specifically a trusted organization handling donations for open source projects.
Chalk the snotty comment, and irrational YC worship, up to your basic herd mentality. The author is, I would bet, a typical HN poster.

ogurechny 3 years ago

The idea that every good project should be a full scale financially stable enterprise with a proper administrative team and dedicated supportive fan club is severely limiting (and is indirectly telling you to know your place in existing chains of power made that specific way). More often than not, services like those are made not by underground kingpins, but by common people who happened to be at the right place in the right time. For example, torrents.ru was once just one of the many regional and global torrent trackers, sometimes run by literal teenagers (albeit that one had the best domain name). Look at it today.

Also, «Маша» (Masha) and «Мойша» (Moishe/Moshe) are completely different names, and I've never ever seen anyone using the former for the latter. Either the author stretches it a bit too much, or the author knows something that should not be publicly revealed in the manner they chose (and the whole post is just an intimidating leak).

Anyway, if the author(s) have successful illegal business, as implied, they shouldn't have any difficulties in acquiring enough spare identities to burn. As a side note, it's quite ironic that “security” is such an idol today that common people need to go out of their way to evade tracking, while even petty internet criminals buy virtual identities in bulk, and have special instrumented browsers to load fake system data with one click.

1vuio0pswjnm7 3 years ago

With archive.today, sometimes IP addresses may stop working. The following ones appear to be still working.

    x=23.137.248.133 # NL
    x=41.77.143.21 # GB
    x=51.38.69.52 # GB
    x=51.79.250.183 # SG
    x=79.133.51.130 # DE
    x=89.253.237.217 # RU
    x=90.156.209.190 # RU
    x=90.156.209.190 # RU
    x=91.193.43.144 # NL
    x=94.140.114.194 # LV
    x=130.0.232.208 # UA
    x=139.99.171.251 # AU
    x=139.99.89.157 # SG
    x=178.17.174.208 # MD
    x=178.250.243.66 # RU
    x=185.101.35.175 # NO
    x=185.125.168.154  # NO
    x=188.143.233.210 # RU
    x=192.124.216.250 # RU
    x=192.210.214.166 # US
    x=193.148.248.205 # NL
    x=193.233.203.196 # MD
    x=217.197.116.88 # RU

To test, something like

   printf 'GET /timemap/example.com HTTP/1.0\r\nhost: archive.is\r\nconnection: close\r\n\r\n'|openssl s_client -connect $x:443 -ign_eof

   echo $x archive.is >> /etc/hosts
   curl -0A "" https://archive.is/timemap/example.com

1vuio0pswjnm7 3 years ago

In less than 24h all of these except one does not work. Numerous HN commenters have an affinity for archive.is, dropping links on countless submissions. These do not work for everyone.

swapfile 3 years ago

Interesting read. I've thought about this for a while.

My woes with the site is that my connection to any of the clearnet domains seem to get black holed, or completely blocked by Cloudflare while using Tor. The onion site works fine for viewing, but to archive pages I need to complete the extremely difficult Cloudflare CAPTCHA.

Bu9818 3 years ago
The captcha page looks like cloudflare, but I don't think they're using cloudflare, haha. They use recaptcha (not sure if that's possible with cloudflare), the `server` header doesn't == 'cloudflare', accessing by direct ip gives "hello world" instead of the "Direct IP access not allowed" cloudflare message, /cdn-cgi/trace isn't accessible.
Not sure why they do that. Is it just because it looks decent, or is it poking fun, maybe because of their issue with 1.1.1.1?
- swapfile 3 years ago
  
  >The captcha page looks like cloudflare, but I don't think they're using cloudflare, haha.
  That's amazing, I never bothered to take a look once I saw that page but I did just now, and you're right. Google reCAPTCHA skinned as Cloudflare, hysteric.

rejectfinite 3 years ago

He is a king amongst men.

Incredible service to humanity.

defrost 3 years ago
Sure, ... unless the person behind the “Denis Petrov” nom de guerre is another Alexandra Elbakyan.
- rejectfinite 3 years ago
  
  Why is that bad? From searching, she is behind sci-hub?
  
  12 replies →
- pests 3 years ago
  
  Did Alexandra Elbayan do something wrong ? Why wouldn't you wouldn't to be compared to them?

nora-puchreiner 3 years ago

>> Github ... account called “volth” ... contributed ... to NixOS

>Volth maintained NixOS Perl subsystem:

https://github.com/NixOS/nixpkgs/commits/master?after=1c72dc...

>> The obvious denispetrov.com ... programmer ... a New Yorker ... end of a 25-year career and the blog dries up entirely in 2011, so it doesn’t match the place or time

>A Perl programmer: http://web.archive.org/web/20050208095206/http://www.denispe...

Archive.is started in 2012, just after retirement, why these do not match?

yamrzou 3 years ago

Who is archiving the archive?

not_your_vase 3 years ago

What also worries me a bit is that Wikipedia started to use them in their references, to archive paywalled references.

Generally I enjoy archive.today very much, but it seems to be a labour of love which can go away any moment (despite its apparent resiliency), rather than something for the ages...

ldjb 3 years ago
Wikipedia does not require that references be free to access. Most books, journals and physical newspapers fall into this category. So editors are perfectly entitled to reference paywalled articles. The fact that you can access some of these through Archive.today is really just a nice bonus. I am a bit concerned about the possibility that the service might just vanish someday, but I don't think that's a reason not to use it.
- arp242 3 years ago
  
  The problem is more that a lot of references will just disappear. It just so happened that earlier today I opened 4 or 5 references for an event that happened around 2009-2012. They all gave either a 404s or just redirected me to the homepage.
  This is why I consider these types of archives important: not so much to bypass paywalls, but to ensure content is still available in a decade, or two decades.
  
  2 replies →

k-ian 3 years ago

It's strange to hear the author isn't a fan of cryptocurrency. There are a lot of dubious use-cases for crypto, but facilitating donations for sketchy services is an obvious one.

omnimus 3 years ago

The person probably have strong sense of ethics. (it would fit the project too)
mikrotikker 3 years ago

What is sketchy about it?

archo 3 years ago

Why Archive.today

HN Readers, Commentators and Story Submitters are aware of the FAQ; 'Are paywalls ok?

It's ok to post stories from sites with paywalls that have workarounds. : https://news.ycombinator.com/newsfaq.html

Which means for the Publishers of pay-walled sites to be featured on HN,

which is a prime site for garnering potential paying new prospects and to replace natural attrition of subscribers.

Some Publishers see this as a positive.. free samples, a minor amount of the Publications full output,

much like a paid agent in a supermarket giving out cheese and/or spiced meat on a stick etc, to encourage new users.

Other Publishers see this as mice nibbling at their cheese.

I use archive.is because almost of the Articles submitted to HN are already archived, so it is a simple copy and past.

archive.is does Not require java to read or to archive an Article,

it is fast and the archiving scripts work on most sites.

ttctciyf 3 years ago

Interesting read!

For a while now, I've had infrequently occurring arcane cert/SSL issues connecting to archive.ph and its siblings, but trying a couple of links from the article I find I can't get past an endless cycle of "one more step" captcha protection - tried clearing all cookies and revisiting an old url, but to no avail.

resolutebat 3 years ago

archive.today is the "official" name, which redirects to the domain of choice (right now archive.md, at least for me).
archive.is is blackholed in many places.
Trouble_007 3 years ago
Change your DNS - you are using CF
- ttctciyf 3 years ago
  
  Are you suggesting the cert problem is DNS related or the new captcha issue?
  DNS was ISP, not 1.1.1.1, and I get the same behaviour after switching to 8.8.8.8.
  
  8 replies →
- stonogo 3 years ago
  
  I'm using Quad 9 and getting the same results. Who is the right DNS provider?
  
  2 replies →