← Back to context

Comment by josephg

6 years ago

I've been thinking on and off about this problem space for about a decade now - having worked on google wave, ShareJS and ShareDB. The architecture I want is something like this:

- My data is stored at a well known URL on a machine that I own. If people don't want to host their own stuff, they can rent another person's computer.

- We need a standard protocol for "data that changes over time". This is a really obvious point once you start thinking about it - REST doesn't support realtime edits, and websockets / zmq / etc are all too low level. We need a standard way to express semantic changes (eg CRDT edits) and do catchup, that can work across multiple devices / applications / underlying protocols. I've been working on this as part of statecraft - https://github.com/josephg/statecraft but its still hidden behind all the documentation I haven't written yet.

- Then we need application-specific schemas to be published. Eg, there should be a standard calendar schema with events / reminders / whatever. Any calendar vendor could provide this. Then calendar apps could request on login from the user where the calendar data actually lives. Those apps could be web / mobile / desktop / whatever, because remember - we have a standard way to interoperate with data like this.

- Ideally the data would also be stored encrypted at rest. The server shouldn't need to be able to read any of the user's data.

You could build a peer to peer system whereby my desktop apps and phone share data with one another. But ideally, data should be accessible from any device at any time without worrying about whether your laptop is on or off. For that we need servers. You could make a single persistent server be a peer in a CRDT-based cluster of devices. That might be better - but its harder to implement and might run into issues with bandwidth and size (I don't want my phone to sync my whole photo library, etc). There are some generally unsolved problems here, but I don't think they're beyond us.

If you're working on this problem and want to chat, throw me an email - I'm me@josephg.com.

I feel the dat project (https://datproject.org/) ticks some boxes you want as a base protocol. Dat itself is an easy-to-use tool for syncing large files between machines, but its core (http://awesome.datproject.org/hypercore) gives you everything you need to build on top of.

With dat you have: - urls to individual files, with the understanding that they will change over time - built-in encryption and non-repudiability of every change - storage wherever you want, no peer is more important than any other

At least the calendar aspect of this has existed for over a decade. Similarly there are standardized protocols for handing sever-based contact lists and email and chats. Sure most of those protocols are crappy but you’ll just use a library anyway so that’s not an excuse. The problem is not on the technical side. Basically all the big providers have decided a walled garden is the best long term strategy and have been dropping support for any standards-compliant data sharing.

14 years ago I was syncing my contact lists from my personal server to a flip phone over vCard format (I think? Don’t remember exactly, but the contacts equivalent of CalDAV). Nowadays you either need to oAuth into Google or whatever other centralized provider a user might have, with whichever proprietary API they may or may not give you access to.

As with so many other things, a shiny new technical solution won’t change the reality that big companies are not incentivized to support interoperability.

  • Apple support for CalDAV and CardDAV is in a good shape. The sync mechanism isn’t as good as the native integration with iCloud, but I see it’s related to the protocol.

I really get the motivation. And id pay both convenience and money. But whats the business model, or why would the existing user data silo megacorps ever want to interoperate? Or is it just a new old third way of self hosting for the motivated minority?

  • If it's easy enough and faced with the ever-growing concern with surveillance and dark patterns, it might not only be the minority that is motivated.

> We need a standard protocol for "data that changes over time".

This already exists: it's called a DVCS.

  • DVCS focus exclusively on files rather than JSON (or whatever). They’re diff-based rather than operation based. They aren’t designed to work in a real-time manner. They require humans to fix merge conflicts. (Even many trivial ones). They store every change for all time (and require that you do so by design). And commits are a heavyweight operation. If you made a git commit for every keystroke, git would struggle. Also the network protocols aren’t published, and usually aren’t designed to work over websockets / etc for web apps.

    We need something like a dvcs, but not exactly that.