Comment by josephg

6 years ago

I've been thinking on and off about this problem space for about a decade now - having worked on google wave, ShareJS and ShareDB. The architecture I want is something like this:

- My data is stored at a well known URL on a machine that I own. If people don't want to host their own stuff, they can rent another person's computer.

- We need a standard protocol for "data that changes over time". This is a really obvious point once you start thinking about it - REST doesn't support realtime edits, and websockets / zmq / etc are all too low level. We need a standard way to express semantic changes (eg CRDT edits) and do catchup, that can work across multiple devices / applications / underlying protocols. I've been working on this as part of statecraft - https://github.com/josephg/statecraft but its still hidden behind all the documentation I haven't written yet.

- Then we need application-specific schemas to be published. Eg, there should be a standard calendar schema with events / reminders / whatever. Any calendar vendor could provide this. Then calendar apps could request on login from the user where the calendar data actually lives. Those apps could be web / mobile / desktop / whatever, because remember - we have a standard way to interoperate with data like this.

- Ideally the data would also be stored encrypted at rest. The server shouldn't need to be able to read any of the user's data.

You could build a peer to peer system whereby my desktop apps and phone share data with one another. But ideally, data should be accessible from any device at any time without worrying about whether your laptop is on or off. For that we need servers. You could make a single persistent server be a peer in a CRDT-based cluster of devices. That might be better - but its harder to implement and might run into issues with bandwidth and size (I don't want my phone to sync my whole photo library, etc). There are some generally unsolved problems here, but I don't think they're beyond us.

If you're working on this problem and want to chat, throw me an email - I'm me@josephg.com.

12 comments

josephg

rakoo 6 years ago

I feel the dat project (https://datproject.org/) ticks some boxes you want as a base protocol. Dat itself is an easy-to-use tool for syncing large files between machines, but its core (http://awesome.datproject.org/hypercore) gives you everything you need to build on top of.

With dat you have: - urls to individual files, with the understanding that they will change over time - built-in encryption and non-repudiability of every change - storage wherever you want, no peer is more important than any other

networkimprov 6 years ago
The Dat protocol is remarkably complex. [1]
Shared object updates should be deliverable by any protocol that works for a specific application, whether client/server, peer-to-peer, or store-and-forward.
[1] https://datprotocol.github.io/how-dat-works/
- rakoo 6 years ago
  
  It is complex if you compare it to, say, a simple cp to another system, but for all the features it has (and don't feel optional at this point) it is remarkably easy to grasp. The BitTorrent protocol is extremely straightforward, and dat clearly takes inspiration from it and build on top of it. Perhaps I'm to use to it and see the increment as too small, though.
- pfraze 6 years ago
  
  It is, but the internals of any filesystem or database will be complex. That doesn't make them difficult to use.
keybits 6 years ago

Dat is mentioned in the OP article[0]: the authors have built on top of hypercore[1] (part of the Dat project) to make Hypermerge[2]
[0] https://www.inkandswitch.com/local-first.html#crdts [1] https://github.com/mafintosh/hypercore [2] https://github.com/automerge/hypermerge

OkGoDoIt 6 years ago

At least the calendar aspect of this has existed for over a decade. Similarly there are standardized protocols for handing sever-based contact lists and email and chats. Sure most of those protocols are crappy but you’ll just use a library anyway so that’s not an excuse. The problem is not on the technical side. Basically all the big providers have decided a walled garden is the best long term strategy and have been dropping support for any standards-compliant data sharing.

14 years ago I was syncing my contact lists from my personal server to a flip phone over vCard format (I think? Don’t remember exactly, but the contacts equivalent of CalDAV). Nowadays you either need to oAuth into Google or whatever other centralized provider a user might have, with whichever proprietary API they may or may not give you access to.

As with so many other things, a shiny new technical solution won’t change the reality that big companies are not incentivized to support interoperability.

tgragnato 6 years ago

Apple support for CalDAV and CardDAV is in a good shape. The sync mechanism isn’t as good as the native integration with iCloud, but I see it’s related to the protocol.

Vinnl 6 years ago

I'm not sure about the CDRT part, but otherwise I think Tim Berners-Lee's new SOLID project ticks those boxes? https://solid.inrupt.com/

(Except the encryption, I think.)

donavanm 6 years ago

I really get the motivation. And id pay both convenience and money. But whats the business model, or why would the existing user data silo megacorps ever want to interoperate? Or is it just a new old third way of self hosting for the motivated minority?

feanaro 6 years ago

If it's easy enough and faced with the ever-growing concern with surveillance and dark patterns, it might not only be the minority that is motivated.

pdonis 6 years ago

> We need a standard protocol for "data that changes over time".

This already exists: it's called a DVCS.

josephg 6 years ago

DVCS focus exclusively on files rather than JSON (or whatever). They’re diff-based rather than operation based. They aren’t designed to work in a real-time manner. They require humans to fix merge conflicts. (Even many trivial ones). They store every change for all time (and require that you do so by design). And commits are a heavyweight operation. If you made a git commit for every keystroke, git would struggle. Also the network protocols aren’t published, and usually aren’t designed to work over websockets / etc for web apps.
We need something like a dvcs, but not exactly that.