Local-first software: You own your data, in spite of the cloud (2019)

2 years ago (inkandswitch.com)

51 comments

Jtsummers

Love this and it aligns well with the work I'm currently doing.

I believe local-first peer-to-peer networks have gotten close-enough to being able to deliver feature parity with centralized Application Service Providers for 5+ years now, the industry just hasn't caught up.

For social media platforms like Facebook and X, their "killer app" features are all solvable with hash-based data structures (chains, trees, forests), p2p capability systems, and gossip protocols. Identity systems are solvable in ways that non-technical people can understand.

Part of the solution was the p2p stack maturing. The other part is that centralized solutions have normalized a lot of user flows that p2p can be competitive with, users have been trained up on a lot of patterns that previously wouldn't have been palatable in the market.

There is extreme market demand for it, and the zeitgeist is tuned into the societal and financial failings of ASPs right now.

Facebook is trying to act as the free relay and archivist for the entire world's social graph, there are a lot of hard problems to solve there which personally, I think contributed to social networking silently dying and degenerating into the monstrosity that Social Media is.

Once this stuff starts making it into end-user products, I suspect it's going to be a one way door. P2P can provide user experiences that centralized services can't match - either for technical, financial, or legal reasons.

We are in a sweet spot of opportunity for building the web as it was promised to us.

If folks working in this space want to compare notes, or are looking for work on a well funded no-BS R&D team developing this space, my email is in my bio.

api 2 years ago
It’s not a technology problem. It never really has been. It’s a business problem, which is another way of saying it's a people problem.
People have been conditioned to believe all software must be free, but that SaaS is fine. People are willing to rent software in the cloud but not pay for it at the endpoint.
Add to this the fact that the cloud is the ultimate DRM. It can’t be pirated if the user doesn’t even have binaries or have possession of their own data.
SaaS provides inescapable recurring revenue and very strong lock in. It’s massively superior from a business model point of view. Meanwhile the (much more) evil twin of SaaS, namely “free” surveillance and addiction driven apps, dominates B2C.
People might want local first software but if they won’t pay for it that doesn’t matter. The whole industry will wrap itself around whatever model pays. Everything is all about cloud and SaaS not because these are technically superior or better for the user but because that's where you get cash flow.
- barrysteve 2 years ago
  
  It's deeper than cash flow.
  As soon as computing becomes a primary job and not a toy... people clam up about their ideas and stick to safe, generic, bland output.
  Computing as-a-job for random plebs, has sacrificed the protection of ideas in pursuit of more innovation and freedom. Resulting in the least innovation and freedom we've seen in decades.
  Until we can walk on water again, this spiral will not end. Why people cannot see this spiral and treat it like an existential threat to the web, I have no idea.
  A 'new kind of network' is surface level analysis, we need to stop computers sucking the life (and consciousness) out of everything they touch.
  Cash flow is way easier, when you can work on something novel and deliver it to the end customer without the concepts being stolen along the way.
  
  4 replies →
- daitangio 2 years ago
  
  Yes, for sure SaaS is easier and cheaper (for 1 month :) than BigBoxSoftware you saw years ago.
  I am concerned by the threat to open source this model brings to us.
  
  1 reply →
canadiantim 2 years ago
Can you recommend some p2p technologies that people can start building with today?
Eg I’m tempted to go with a stack of a centralized Postgres with a SQLite for each user (a la Electricsql, mvsql, etc.) because I want users to have their data local as much as possible (in their own SQLite db) but a centralized Postgres database still feels logistically necessary to coordinate eg user accounts and sharing.
I’m all ears on better p2p solutions. I’ve looked into things like eg using matrix or nostr but they don’t feel like the solution I’m looking for yet. But I’m eager to find something that can keep data local and have sharing done more p2p.
- r3trohack3r 2 years ago
  
  We are building our overlay network with libp2p
  Our current gossip structure for identity is a hash tree. The secure scuttlebutt research paper by Tarr is a good place to start for that.
  For capabilities, we are (currently) using the same concepts as UCAN.
  
  2 replies →
- rakoo 2 years ago
  
  Couchdb/pouchdb remains one of the best: it's super easy to setup and is production-ready, but it's gonna be json docs with no transactions, so it can be limiting.
  Y.js and automerge emerged as solutions combining CRDTs and content transfer, they look really promising. There is a Y.rs version if that's better for you.
  I've always dreamt of building something on top of Syncthing, ie something that would use file synchronization. It's more versatile and will definitely last longer than anything else, and it has some built-in capabilities for having a third party helping transport but not being allowed to read content.
  I recently came across https://github.com/superfly/corrosion , a service discovery and state management tool that is working completely p2p. CR-SQLite, in particular, allows multiple tables from multiple databases to be merged thanks to CRDTs. I'm sure there's a lot to build on top of it.
  I feel like you're not really interested in full p2p but want some centralization point to manage some auth stuff, so I'd investigate couchdb/pouchdb first.
  
  2 replies →
- ccorcos 2 years ago
  
  Definitely check out SocketSupply! https://socketsupply.co/guides/#p2p-guide

evtaylor 2 years ago

My side project is a personal budgeting web app [0] based on these principles. The technologies to support this type of architecture are available and fairly mature.

For example in my budgeting app:

  - All of the user's budget data is stored client side in IndexedDB
  - Offline support via a service worker
  - Peer to peer synchronization between devices via WebRTC data channel
  - Lightweight IdP server which also handles the WebRTC signalling (via web sockets)

This setup is nice because the user's data never touches the server and very little server side infrastructure is needed (none at all if you don't need data synchronization). One downside with the data sync is that both devices must be online with the app open at the same time to sync, but perhaps this could be improved with web workers?

[0] https://dollero.app

Cyphase 2 years ago
Looks visually quite similar to Actual[0][1][2] (open-source, local-first). Did one influence the other? Are they both inspired by some other budget software?
[0] https://actualbudget.com/
[1] https://github.com/actualbudget/actual
[2] https://github.com/actualbudget/actual/blob/d1e57340b88960d0...
- evtaylor 2 years ago
  
  I discovered Actual while I was working on Dollero although it was not open source at the time. I believe we were both influenced by YNAB classic, hence the similar layout. I imagine hosting Actual was time consuming and expensive for the author because all syncing appears to rely on a centralized server and database. I'm hoping to avoid this with Dollero because the peer-to-peer syncing requires a much simpler server and is cheap and easy to host.
mk89 2 years ago
> but perhaps this could be improved with web workers?
This also means that I need to have a constantly running web worker on my phone/tablet etc...? In my time I am also trying to solve this problem of "data never touching a server" but I believe we're not there yet, despite all the devices we have: sometimes the smart watch is off, sometimes the laptop is off, etc., so you need a mechanism similar to "Google drive", and who never had sync issues with that? And that's a supercentralized solution. It's super hard to "merge" in a distributed async system.
- adr1an 2 years ago
  
  You can add push notifications or whatever, the issue will be resolving the conflicts. I think the local-first ethos is valuable in situations (e.g. resilient communications a la reticulum network), but misses to address another bigger problem. Perhaps someone with knowledge on CRDTs, or data structures, knows better how much of timestamp this might be able to catch... maybe one day we have Transformers Models that take the data on both ends and just 'hallucinate' a merge that you can then manually intervene (in the case of code, run tests).., idk.
kwhitefoot 2 years ago

You could use an IMAP email account to send synchronization messages. Then the peers do not need to be online at the same time.
dugmartin 2 years ago
This is really well done. Nice job.
- evtaylor 2 years ago
  
  Thanks, I appreciate you feedback.

satvikpendem 2 years ago

As someone who very closely watches this CRDT and local-first space, it's really quite a bit harder than doing it the old fashioned client-server-model way, because you have to architect your systems to sync seamlessly when offline and then online. I've used tRPC and GraphQL before and I've really been thinking of something where we'd have the same DX as the client-server model such as with tRPC and GraphQL but in a way where there is no difference between client and server state, there is only "state." This makes sense because for local-first, the entire state of the app must be replicated locally anyway.

Recently there have been CRDT solutions that try to solve this problem, however, such as Triplit [0], or ElectricSQL [1].

[0] https://news.ycombinator.com/item?id=37584049

api 2 years ago

If the same effort were put into software stacks for this as gets put into stacks for cloud-first development and massively scalable devops, there would be lots of canned solutions to all these problems that are very mature and stable. You wouldn't have to know what a CRDT is or how NAT hole punching works any more than you need to know how a data center fabric or a SAN works to use cloud.
See my other reply: https://news.ycombinator.com/item?id=37743517#37746101
rzzzt 2 years ago

Meteor wrapped the MongoDB API for this purpose. You are working with collections and can run the same queries over them, regardless of whether you are connected to a DB instance or the browser's local storage.
For CouchDB an equivalent exists in the form of PouchDB: https://pouchdb.com/
ajb 2 years ago
Interestingly, the UK Post Office injustice scandal seems to have occurred because they (and their contractors) couldn't get to the bottom of errors in a local-store distributed system, and ended up blaming (and prosecuting!) Users. And that was just a distributed append only message store, not something as complex as CRDTs
- nvm0n2 2 years ago
  
  Yes people think local-first sounds fun but in mission-critical scenarios it's horrible. Post Office abandoned it as soon as networks got good enough (~2010) with the old local-first replicating version being phased out as "Horizon Legacy" in favour of a new always-online version.
  It didn't stop the bugs. There are lots of ways to screw up even in a fully online system. It certainly helped though!
AlexErrant 2 years ago

Also https://github.com/vlcn-io/cr-sqlite/ which is SQLite + CRDTs
Runs/syncs to the browser too which is just lovely.

YousefED 2 years ago

This article has been quite the inspiration for many projects and progress on this front. I think we're seeing more and more developments around CRDTs and local-first frameworks / applications.

I'm working on a few projects in this area:

- https://www.typecell.org - Notion meets Notebook-style live programming for TypeScript / React

- https://www.blocknotejs.org - a rich text editor built on TipTap / Prosemirror that supports Yjs for local-first collaboration

- https://syncedstore.org - a wrapper around Yjs for easier development

In my experience so far, some things get more complicated when building a local-first application, and some things get a lot easier. What gets easier is that once you've modeled and implemented the data-layer (which does require you to rethink / unlearn a few principles), you don't need to worry about data-fetching, errors etc. as much as in a regular "API-based" app.

Another interesting video I recommend on this topic is about Linear's "Sync Engine" which employs some of the local-first techniques as well: https://www.youtube.com/watch?v=Wo2m3jaJixU

bookaway 2 years ago
>https://www.typecell.org
I took a look at the landing page out of curiosity, just an FYI but at first glance there's nothing that indicates to me that this is not a regular SaaS app.
Specifically, unless I'm missing something, nothing in the text jumped out at me indicating the app satisfies this condition stated in the article:
>for good offline support it is desirable for the software to run as a locally installed executable on your device
Might want to make this feature more prominent if you support it.
- YousefED 2 years ago
  
  Thanks, great feedback!
  Although it's entirely architected on a local-first stack, I indeed haven't shipped the main benefit of this, a locally installable app. There's a WIP PR here that adds PWA support: https://github.com/TypeCellOS/TypeCell/pull/352. I'll highlight this more when this is merged.
  Nevertheless, some of the benefits are already noticeable and come "out of the box" with building on a local first architecture, even if not shipping an executable yet: - multiplayer sync - speed: documents are loaded from local storage initially if they have been loaded before, and changes sync in after that
  In the future (when there's an installable app), I also want to enable saving / loading from the file system, so that it's completely transparent where your data is.

Jtsummers 2 years ago

Only three past submissions with discussions (though quite a few comments) the most recent discussion about two years ago:

https://news.ycombinator.com/item?id=19804478 - May 3, 2019 (191 comments)

dang 2 years ago

Thanks! Macroexpanded:
The cloud is a prison. can the local-first software movement set us free? - https://news.ycombinator.com/item?id=19804478 - May 2019 (190 comments)

dugmartin 2 years ago

I've switched to building VS Code extensions for my local-first personal side projects. The reason: no server to run and all my data is in plain text so I can put it in source control.

I've built and use a time tracker, a double entry accounting system (using a Beancount(ish) syntax) and an activity tracker. Two of the three of those were using older CLI code I wrote a few years ago adapted into an extension.

Adding a webview that tracks the content of a tab is pretty simple. It also gives me the excuse to write parsers (some of my favorite code) which allows me to render the text into a data structure that my webview code (in my case React) can render on each keystroke. I don't get fancy, I parse the full buffer on each keystroke and with proper React state management the DOM updates are optimized.

I don't think I'll ever open source or publish the extensions themselves as they are too specialized for what I need and want but I may (if I have time) generalize the webview + tab change/rename/delete/close tracking code into an open source library.

I'd recommend this approach if it fits with your side projects. Its an easy way to get a web gui for your ideas without adding to the graveyard of broken domain dreams.

ccorcos 2 years ago

I love this article and its ideas, but let's talk about where it falls short in reality.

1. CRDTs are a pain, especially rich text. Schema migrations are a pain. And eventually when you have a very successful product, you'll have issues with vector clocks that grow linearly and unbounded with every user who pops by to collaborate. If you're successful enough to have a 2000 person workspace, you won't be able to / want to sync an entire workspace of data in realtime to everyone's local machine, so the whole system devolves into a glorified local cache (nothing wrong with that, but it's not a CRDT and arguably isn't local-first).

2. Freemium SaaS has proven to be the most lucrative business strategy. Users get stuff for free which widens the top of the funnel. Students go into the industry and bring their tools with them, and then you charge enterprises for collaborative workspaces per-seat (I call this the Slack playbook). Given the challenges of CRDTs outlined above, you're making this job really hard.

3. If you're making all of your money on enterprises, they would prefer that a rogue employee can't save all of the company information offline when they leave. That's why you can't export your company Gmail or Notion workspace. Employees will pretty much always be online so local-first isn't really necessary for them, although everyone does want the performance of local-first.

4. There are large swathes of problems that are not suited for local-first -- basically any application that requires transactional writes. You can't enforce simple invariants like "every use must have a unique username". Almost every application has requirements like this unless you specifically design around them. Sometimes, they can't be designed around at all.

At the end of the day, I like to think of choosing local-first through the lens of the "technology budget". If you want to build something local-first, then that's your entire technology budget and then some. Do you want to bet your entire business on that? Or do you want to use tried-and-true tools and spend that technology budget on something else that moves the needle?

Developers love to know how something is built, but users really don't care so long as it works great! Example: Obsidian.md is awesome, and local-first. But it's mostly developers who use it -- everyone else just uses Notion.

I still love Local-First, but as far as I'm concerned, it's a developer honeytrap.

xnx 2 years ago

Is there any equivalent for apps where the code is open source and the data is totally under your control but in the cloud? I'd love something that's as easy to install as an Android app, but ran on a server someone else maintains. This is certainly possible now with enough effort, but there doesn't seem to be anything nearly as standard and easy as PHP on a shared host used to be.

rzzzt 2 years ago
"unhosted web apps" is a series of articles that explore some avenues to this: https://unhosted.org/
- meiraleal 2 years ago
  
  Thanks! So many good articles, great ideas there!
nvm0n2 2 years ago

Well running something "in the cloud" can be arbitrarily complicated, so that's a kinda undefined question.
The simplest equivalent is just regular apps that save data to somewhere that's backed up.
ethbr1 2 years ago

I work with an enterprise that takes this approach, albeit with on-prem warm & cold data.
It seems to make a lot of sense for them, which is probably why cloud providers discourage it.
Data is the important part. Cloud compute can be swapped.

graypegg 2 years ago

Omnifocus does quite well with this. They offer a sync service, but it’s just webDAV syncing a few XML files so you can easily host your own if you want, or throw the files in iCloud.

nvm0n2 2 years ago

People who are serious about this topic should start by studying past examples. Once upon a time all software was local first, and there's plenty to study there about what changed and why.

The masters of this game are Apple. Bar none. All their software is local first and asynchronously syncs to iCloud (optionally!) using CloudKit, which is a relatively consistent replicated database. This design is hard to master but worth starting with. They also support P2P that works (Bonjour and friends). Apple can do this because they're firmly rooted in the software development milieu of the 80s and 90s. They never got the memo about the web taking over and revel in writing high quality desktop apps. Users love them too. Developers like to pretend this ecosystem doesn't really exist outside of mobile, but it very much does.

There are a few things that make this hard for other kinds of developers to do well.

One is that it requires apps to have file formats, and most devs don't learn these days how to design them. They know SQL but not how to wrangle complicated byte arrays. And our infrastructure kinda sucks at working with files. You can't POST a directory over HTTP, for example. SQLite helps a lot here, but you still have to publicly document your schemas if you want the user to truly have a chance at "owning" their data. The cloud then just holds a backup of these files.

The desire for strong collaboration and sharing features is another problem. Apple isn't particularly big on those. However, a lot of software barely needs it! In most cases a way to publish or share documents with a hyperlink is good enough, especially if others can comment on it. A lot of users don't actually want to allow unrestricted real time edits to their work as it undermines ownership, hence why Google Docs has now moved to a more Word-like workflow where edits are help in suspension until explicitly accepted.

The biggest problem is that the only form of sharing that actually works well is an HTTP link, due mostly to OS makers catastrophically dropping the ball on anything else. And that pushes people towards the web. Apple solve this with light web renderers for their file formats that let people view read only copies, more or less. Often this is sufficient.

The Apple approach + an openly installable backend is a reasonable way to do local-first software. It doesn't imply the software has to be free or open source. It can still be proprietary, you just need to sell the backend as software as well as the frontend.

surrTurr 2 years ago

I'm building a flashcards app using Pouch / Couch DB at the moment. It's not done yet, but I wonder if making the switch to Automerge would be worth it?

Are there any resources to learn more about the differences it makes for a developer, when implementing a traditional Server-client architecture as opposed to a P2P approach? So far, I have only worked with server-client architectures, it's hard for me to estimate the pros / cons.

Also, is there an option for Automerge to sync via a central server instead? Any examples?

eviks 2 years ago

So has there been much progress? Is there, for example, a rich text CRDT editor that adheres to these principles?

steida 2 years ago

I made a local-first platform designed for privacy, ease of use, and no vendor lock-in to sync and backup precious data using SQLite and CRDT.

https://github.com/evoluhq/evolu

pulse7 2 years ago

“Sooner or later, everything old is new again.”

surrTurr 2 years ago

How do CRDTs work with blobs / files?

Rygian 2 years ago

I don't know. I guess they simply don't.
My approach would be to categorize user content into
- CRDT-enabled content considered as the norm, and
- content trapped in legacy containers that do not support CRDT.
With an implied warning for the user: legacy content offers a limited user experience.
Migration from "legacy" to "normal" should be fully supported, and as transparent as possible (but needs dedicated support for each file format). Converting "normal" (CRDT-enabled) content into files/blobs should be treated as a "take an snapshot" export operation.