Perkeep – Open-source data modeling, storing, search, sharing and synchronizing

8 years ago (perkeep.org)

110 comments

noncoml

mastax 8 years ago

Since people are confused about what this is I'll write a summary (from old memory so it's probably 80% correct)

It is a consumer-oriented storage system that is:

- Content addressable

- Indexed

- Tag-oriented (vs. hierarchical)

- Permissions, encryption, compression, sharing, etc.

- Spans storage across machines and clouds

- FUSE mountable

- Has CLI and Web interfaces built-in

The intent is to be a personal data dumpster that you can throw all of your files and other data (tweets, etc.) into for search and backup.

The website could be better organized to convey this information quickly.

bradfitz 8 years ago
Camlistore (renamed to Perkeep) author here.
It is true that the website needs some love & updated docs. We've been working on Camlistore for 8 years now (with a few drier spells) but our focus has never been marketing. If anything, we didn't want too many non-nerd users for a number of years because it wasn't ready for non-developer usage. That's starting to change.
We have pretty good docs for configuration and such, but we lack some concise high-level text about what the project is and why.
I'll prioritize that.
- jasim 8 years ago
  
  For everyone else reading this, here's more context. I once tried creating durable physical storage that spanned multiple external hard-disks with a single logical schema, but then discovered Camlistore and git-annex and decided to let more competent people build it.
  The idea is that we should be able to own and manage our personal data - which runs into terabytes across one lifetime - without having to trust and/or pay the big cloud companies. So Camlistore from its earliest days had integrated photo gallery since multimedia is where most of the bytes are consumed.
  The whole thing once had the label the IndieWeb movement (which we should revive), and Wired wrote about it here - https://www.wired.com/2013/08/indie-web/
  Brad Fitzpatrick is also the creator of LiveJournal where he wrote the original version of Memcached in Perl. He also wrote OpenID, and then went on to work with Rob Pike and team on the Go Programming language. Camlistore was one of the earliest projects written in Go (before Hashicorp made it cool) and I imagine that had something to do with him getting into the language itself, but that's for Brad to clarify :)
  
  1 reply →
- catwell 8 years ago
  
  It would be interesting to have a line or two about the differences and potential synergies with Upspin (https://upspin.io).
  
  1 reply →
- InternetOfStuff 8 years ago
  
  Thanks a lot for chiming in.
  Sorry to say I'm still confused by what Camlistore does.
  Would it be fair to say it's similar to Syncthing[1]?
  [1] https://syncthing.net/

nayuki 8 years ago

Some previous threads on Camlistore/Perkeep:

* [2014 Jun] https://news.ycombinator.com/item?id=2156374

euske 8 years ago

The thing is that nothing is good enough for keeping it for lifetime. A hardware might be broken, a supply might be discontinued and a software maintainer might disappear. You'll need to keep refreshing the data from one device to another, for the rest of your life. That said, I'm curious how easy this system can handle porting from one device or service to another, in varying formats and architectures. The only way to stay relevant is to constantly keep changing/adapting to new things.

bradfitz 8 years ago
A huge focus of the project is on human-readable schemas and formats. Even if all specs & source code of the project is lost, the data should still be recoverable from a curious archaeologist.
Between replicating between several companies as well as your own hardware & having friends & family mirror your stuff (encrypted or not), the ideas is that some copies will continue to exist.
Hardware failures are a given. Companies failing and friends & family dying is also a given. Natural disasters too. The only option seems to be trusting nothing and replicating all your data to lots of places, in future-friendly formats, and that's what Perkeep aims to do. And then a ton of tooling on top of that.
- euske 8 years ago
  
  Interesting. I thought plaintext + .tar.gz or .zip format on either FAT or ext2 fs is the best bet for forward compatibility, and anything beyond that is too complex or obscure for future archaeologists. The obvious problem is the searchability, but I'd imagine in future that indexing a few TB of text/image will be a breeze.

davidbanham 8 years ago

Looks like there's been some nice progress since I last looked at Camlistore! The importers from cloud services like Twitter look really interesting.

natural219 8 years ago

Camlistore & Brad Fitzpatrick's original writings are what initially got me into decentralized web advocacy. Since then, I've moved on from this project, since it seems to move at a very slow place and the authors do not seem very interested in widespread user adoption.

With this name change, I'm slightly more interested again. We'll have to see in the coming months whether they become ready to displace actual large social media platforms or whether it remains a toy project.

nerdponx 8 years ago

How does this work?

jamestomasino 8 years ago
That was my first question, too. After clicking through a few links and even opening up an intro presentation I was left unsatisfied and closed the tab. This project desperately needs an FAQ or overview video up-front.
- davidbanham 8 years ago
  
  The video demo on the front page is a great place to get an overview of what it's all about.
  https://www.youtube.com/watch?v=8Dk2iVlc67M
  
  2 replies →
- Timothycquinn 8 years ago
  
  This 24 minute overview gives a good idea as to the fundamentals of their system. https://www.youtube.com/watch?v=yxSzQIwXM1k
- jbob2000 8 years ago
  
  It downloads and catalogs a bunch of crap onto a local hard drive.
  
  8 replies →
emmelaich 8 years ago

It's content addressable storage - as used by git and plan9's fossil/venti.
https://perkeep.org/doc/prior-art
https://en.wikipedia.org/wiki/Fossil_(file_system)
random3 8 years ago

https://perkeep.org/doc/overview

linsomniac 8 years ago

I've been watching Camlistore for a few years. I peek in on it every once in a while, long enough between that I usually can't remember the name. I like the look of it, but haven't been convinced to go from my decade old ZFS setup to Camlistore.

I feel like OwnCloud is more compelling, from a glance. Anyone use one or both and able to comment?

bradfitz 8 years ago
Camlistore author here.
If you only store files, sure, use ZFS.
Perkeep (Camlistore) doesn't write to a block device. It has storage backends for a filesystem (which can be ZFS) and any number of cloud object storage providers (S3, GCS, etc).
Perkeep's main value over a fancy POSIX filesystem is storing nameless things (tweets, other social media content + interactions, bookmarks) in common schemas, and permitting search over it all, and then having a variety of ways to browse it (CLI, FUSE, API, web UI, etc).
It's also good at sync to & from things any which way without merge conflicts.

_m8fo 8 years ago

How is this any better than just burning your data to a blu-ray, which lasts centuries when stored under proper conditions (theoretically, anyway) I need to give this a closer look.

gf263 8 years ago
This is such a classic hacker news comment
- mixedCase 8 years ago
  
  Always a good time for linking the Show HN for Dropbox: https://news.ycombinator.com/item?id=8863
  
  1 reply →
ams6110 8 years ago
Not having to worry if there will be any Blu-Ray readers available in a century.
- tantalor 8 years ago
  
  Seriously. The only device I have which can read a CD-ROM is my car. The PS4 can read Blu-Ray and DVD but not CD-ROM.
  
  16 replies →
- yeukhon 8 years ago
  
  In a century I don’t even think we’d have CD/Blue-Ray. By then most of us would be dead already, so why worry?
  
  1 reply →
- carterehsmith 8 years ago
  
  >> Not having to worry if there will be any Blu-Ray readers available in a century.
  Century? Startup sites like the one above last on average 6 months, that is, until they find out that their $6/mo DigitalOcean droplet suddenly costs... $10/mo! Or $100/mo or whatever and then they find out they cannot fund their $100/mo droplet and call it quits.
  So... if you need the data to be around for 100 years, maybe not give it to the random startup.
  
  2 replies →
milcron 8 years ago
M-DISC is even better. Burnable discs use an organic dye which oxidizes over time. M-DISC uses a "glassy carbon" layer that is inert to oxidation.
They adhere to DVD-R, BD-R, and BD-XL standards so it's readable in standard disc drives. You need a special drive to burn them, however (requires a high-power laser).
- Veratyr 8 years ago
  
  > Burnable discs use an organic dye which oxidizes over time.
  This is only true of DVDs and a rare variant of Blu-Ray called LTH. Even cheap shitty Blu-Rays from Chinese manufacturers use inorganic dyes these days.
  Also, the French Archives did a test of a variety of DVDs for longevity in adverse conditions and found that M-DISC didn't last significantly longer than competitors, even those with inorganic dyes: https://documents.lne.fr/publications/guides-documents-techn...
  The US DoD also did a similar test under different conditions and found it performed much better than the competition though: http://www.esystor.com/images/China_Lake_Full_Report.pdf
  I suspect the difference between the French and US tests might be the French using a longer test duration and the Americans using light. The French went up to 1000h while the Americans only went to 24 as far as I can tell.
  And unlike DVDs, I haven't seen any studies of longevity for M-DISC Blu-Rays.
davidbanham 8 years ago

It's different (better?) in that it doesn't rely on you remembering to actually burn that data, then store it safely. It comes with an app you can run on your phone to upload all your photos immediately, for instance. It has importers to archive all your tweets automatically, for example. It allows you to outsource the task of "Keep this blu-ray safe" to a cloud provider (or a friend) while encrypting your data to keep it private.

stevekemp 8 years ago

I've been keeping an eye on this project for years, because it seems well-designed, and the authors are very capable developers.

The biggest problem I found was getting documentation on replication. Having two+ servers mirror-each other, across the internet, seems like a good idea given that otherwise you have a single point of failure as you import all your media/files.

teddyh 8 years ago

I’d be interested in a system for converting existing stuff from, for example, the Firefox “ScrapBook” plugin, to this format. (The ScrapBook plugin is not compatible with Firefox 57’s plugin API, so anyone who upgrades to Firefox 57 immediately loses all their saved ScrapBook pages.)

sp332 8 years ago

I have no idea how compatible this is, but someone is working on a new version. https://addons.mozilla.org/en-US/firefox/addon/scrapbookq/

andrepd 8 years ago

The perfect tool for a digital hoarder like myself. Will follow this with attention.

didibus 8 years ago

So, its just a document server that can be run over multiple computers? I was expecting something peer to peer. If I understand correctly, you can think of this as a dropbox that you can self host?

kindfellow92 8 years ago

What is the target audience of this? What are the intended use cases?

Is this supposed to be used directly by users or as an API for a user-facing application? How is this different from a document DB like MongoDB?

flarg 8 years ago

Long time follower of the project here... So far it's been aimed at geeks who want to archive their content from the cloud, eg tweets, but it also stores files. Because of the way it is designed I've always thought there is a compelling use case for its use as a file and object store for organizations where auditing of data records is expected and sharing of data is a requirement.

brotherjerky 8 years ago

So is this ready for prime time yet? I used to follow camlistore, and it was still a little rough even for CLI nerds.

gh02t 8 years ago

So I just downloaded it and played around and as far as I can tell there is no way to delete files. Or, more specifically there is a way but it's not implemented or otherwise accessible as far as I can figure from the rather sparse documentation.
If someone would like to explain to me how (if?) the garbage collection works I'd appreciate it, because I like the concept and kinda want to use this, but deleting stuff is a rather important feature for me. All I could find searching was a post by the devs saying it was already mostly implemented but not finished and not a priority...
https://github.com/camlistore/camlistore/issues/792
Like, I understand that this is a spare time project (I think) but not considering deleting/pruning files to be an important feature is really confusing to me. In its current state, if I accidentally upload the wrong file, am I now stuck with it forever?
Edit: ok I figured out how to at least delete things in the UI (clicking the check mark opens a side menu apparently, `camput delete` doesn't seem to do anything), but as far as I can tell it doesn't actually delete them from the database without running a garbage collect, which isn't implemented so it just hangs around in purgatory.

j7ake 8 years ago

Is this possibly a Dropbox replacement ? do I have to host the files on my own server ?

tradersam 8 years ago

Alternatively: "Hard-drives let you permanently keep your stuff, for life"

melq 8 years ago
Hard drives are an especially bad choice for lifetime reasons, and SSDs don't solve the problem either :P
- userbinator 8 years ago
  
  Tape is usually the preferred magnetic media for long-term storage.
- tradersam 8 years ago
  
  I don't agree — that's why things like redundancy are commonplace. :D
  
  13 replies →
milcron 8 years ago
Check out M-DISC https://en.wikipedia.org/wiki/M-DISC
- _pwes 8 years ago
  
  It's not clear how much better than they are than regular media since there haven't been many tests. There are two that I'm aware of, one by the French Archives (who've done this a few times it so happens) and one by the US DoD.
  The French found that M-DISC didn't perform much better than regular DVDs and that a weird kind of glass DVD beat everything else hands down.
  The Americans found no errors at all in their tests of M-DISC while all other disks encountered them.
  I suspect the important differences were:
  - The Americans' tested the discs after light exposure, the French did not. It may be that the light caused the regular DVDs to fail but not the M-DISC.
  - The French tests were far longer (1000h) than the Americans' (24h). It may be that M-DISC can't survive the adverse conditions past a certain point that the Americans didn't reach.
  Also as far as I'm aware, there are no tests of the Blu-Ray variant of M-DISC.
  Personally, given the cost of M-DISC, I'd buy a few cheap terrible Blu-Rays instead and just make sure they're not exposed to too much light.
  French test: https://documents.lne.fr/publications/guides-documents-techn...
  American: http://www.esystor.com/images/China_Lake_Full_Report.pdf
- bo1024 8 years ago
  
  > While the exact properties of M-DISC are a trade secret
  If long-term accessibility is the goal, not off to a good start...
  
  2 replies →

passwordqq2 8 years ago

Question if anybody gets to this: I'm taking a break from work and computers for a year. How would you guys suggest I store my kbdx data securely In a failsafe manner without worrying about forgetting passwords or losing paper chits or USB keys?

Edit: after seeing some good suggestions about physical storage, I've decided to increase the difficulty of the question, hard mode- How would you do this without physical stuff? (more, new answers about physical welcome too)

jacquesm 8 years ago
For something on the timescale of a year I would just keep the system that you already have up and running. It it were much longer than that I'd go with a bank vault that contains the access keys and something like tarsnap and yet another backup with another cloud provider.
- passwordqq2 8 years ago
  
  I'm assuming all my electronics fries, papers burn and memory goes away. (to be safe)
  Bank vault might be a good idea (assuming they id me fine)
Dylan16807 8 years ago
> Edit: after seeing some good suggestions about physical storage, I've decided to increase the difficulty of the question, hard mode- How would you do this without physical stuff? (more, new answers about physical welcome too)
Store one copy in a gmail account, and another on imgur.
> assuming [...] memory goes away. (to be safe)
And tattoo the site+username+pass on your thigh.
- passwordqq2 8 years ago
  
  Good thinking!
quickthrower2 8 years ago
I wonder if a system like this would be good for your general problem:
Generate a random seed sentence of so many words. From the secret seed + site domain name generate a password
Store piece of paper with:
Algorithm (could be public in github too) Seed word Site names
- shriek 8 years ago
  
  Wouldn't biometrics be good use case here since OP doesn't want to remember it?
  
  2 replies →
simcop2387 8 years ago

For a year? a burned CD in a safe deposit box. Also a USB key there for convenience. Basically paying for physical security of the devices/data.

zyxzkz 8 years ago

I was gonna say, this sounds like Camlistore.

DiThi 8 years ago

Because it is! (edit: oh I see it's in the header)