Comment by jchanimal
2 days ago
These are awesome questions, I'll try to fold the answers into the docs also.
1. The embedded database subscribes to the remote sync endpoint when it is connected. This subscription might be polling, websocket, or anything else. The local embedded database will try to keep up with changes anyone pushes to the remote endpoint. This is more a backend mechanical thing than an API you'll see.
Your code can subscribe to the local database -- this is a JavaScript event loop, and any updates, local or remote, will cause your callback to run. The upshot is all you have to do is connect your database to the sync endpoint and it will stay up to date, and you can also connect your UI to the database via `db.subscribe()`
2. Updates are written to local storage (indexed db or the filesystem) as encrypted blobs. These are then replicated to the cloud (without being parsed by the cloud). We have SQL connectors also, but we haven't done the Postgres specific stuff (just started designing it). That is the data side. There is also the clock register, which the client updates to point to the most recent blob. This register is multi-writer safe, and can occasionally point to more than one "head" blob, in which case the client does the deterministic merge on read.
3. In my experience most people use the defaults, so we have Fireproof Cloud which uses R2 and durable objects. We also have a SAM template for AWS, and a connector for Netlify, in addition things that are more like parts for building your own backend (file and http endpoints).
4. Each ledger replicates 100% when it syncs, so all hosts have the same data (no sharding within a ledger.) Typically you have one centralized endpoint to sync via. (p2p is possible but you'd end up contributing some plumbing to the project I bet). So in this case the class would have a URL that is the sync point, and everyone would pull from it periodically or via streaming.
Merges are idempotent, deterministic, associative, and commutative, so it doesn't matter what order the teacher and students apply updates to their local instance, once all updates are applied, they have the same state.
5. The e2e encryption means you'd have to give the keys to the server to allow it to create subsets for sync, so we haven't done that yet. Our next optimization is to sync the readonly current dataset first, then any extra data needed for writing, and only when necessary, the historical log. This still doesn't solve the subset sync issue, but will benefit all use cases immediately.
There is some cool research we might use for subset sync: https://g-trees.github.io/g_trees/
But more practical is probably to finish the Postgres backend and then build subsetting at the global (multi-ledger) dataset level.
wrt (3) Being able to self-host is extremely important. I noticed a lot of focus on the docs on the Quickstart/client usage. But things like default storage engine as a ENV, path for storage as an ENV. These are very important.
hmm. Replicate state to all clients. Ok.
Seems like an opinionated but well thought through project. Godspeed!
Thanks, and thanks for the encouragement to fully document the gateway interface. We have been flux-ing it lately but as soon as it settles down we’ll do that.
The vision is many small ledgers, so the full replication per ledger makes sense, but we have work to do on cross-ledger queries