Comment by maxpert
17 hours ago
Author here! Every time I post my own stuff here it seems to sink, so hopefully this actually reaches some of you.
Marmot started as a sidecar project using triggers and polling to replicate changes over NATS. It worked, but I hit a wall pretty fast. Most people really want full ACID compliance and DDL replication across the cluster. I realized the only clean way to do that was to expose SQLite over a standard protocol.
While projects like rqlite use REST and others go the page-capture route, I decided to implement the MySQL protocol instead. It just makes the most sense for compatibility.
I’ve reached a point where it works with WordPress, which theoretically covers a huge chunk of the web. There are scripts in the repo to deploy a WP cluster running on top of Marmot. Any DB change replicates across the whole cluster, so you can finally scale WordPress out properly.
On the performance side, I’m seeing about 6K-7K inserts per second on my local machine with a 3-node quorum. It supports unix-sockets, and you can even have your processes read the SQLite DB file directly while routing writes through the MySQL interface. This gives you a lot of flexibility for read-heavy apps.
I know the "AI slop" label gets thrown around a lot lately, but I’ve been running this in production consistently. It’s taken a massive amount of manual hours to get the behavior exactly where it needs to be.
Just want to note that every time I see it I’m impressed with the project, great job so far.
The fact that you’ve been running this with WP is also a really huge use case/demonstration of trust in your different software — IMO this should be on the README prominently.
These days I personally just ignore projects that insist on MySQL — Postgres has won in my mind and is the better choice. The only way I’d run something like a WP hosting service is with a tool like Marmot.
One thing you might find interesting is trying marmot with something like Litestream v2 — marmot of course has its own replication system but I like the idea of having a backup system writing to s3. It seems trivial (as you’ve noted that you can still work directly on the s3 file) but would be a nice blog post/experiment to see “worked out” so to speak.(and probably wouldn't sink to the bottom of hn!)
Marmot already supports debezium, so you can do way more than just basic S3 backups. I've noted your suggestions, it's definitely helpful.
Thanks for the consideration! The reason something like litestream is interesting to me is that it’s (now[0]) an off the shelf way to do PITR backups for SQLite.
Sure, I could piece together or write something myself to catch the CDC stream or run another replica, but simply running one more process on one of the boxes and having peace of mind that there’s an S3 backup continuously written is quite nice.
I thought debezium was mostly for moving around CDC records, not a backup tool per say. I.e. if I were to write debezium records to object storage with their connectors it’s my job to get a recent dump and replay?
[0]: https://fly.io/blog/litestream-v050-is-here/
1 reply →
Also a postgres user. Wondering why MySQL wire protocol and not pgsql's: did the mysql choice have advantages compared to pgsql in this case?
You point out a question that I spent months thinking about. I personally love Postgres, heck I initially even had a version that will talk postgres wire but with SQLite only syntax. But then somebody pointed me out my WordPress demo, and it was obvious to me that I have to support MySQL protocol, it's just a protocol. Underlaying technology will stay independent from what I choose.
Related, Corrosion has experimental support for the pgsql wire protocol (limited to sqlite-flavored SQL queries): https://superfly.github.io/corrosion/api/pg.html
Since Marmot pivoted to the MySQL wire protocol, I haven't had a clear picture of its advantages over using normal MySQL with active-active replication. Can you speak to that?
Here are some that I can think on top of my head:
- Marmot let's you choose consistency level (ONE/QUORUM/FULL) vs MySQL's serializable.
- MySQL requires careful setup of replication, conflict avoidance and monitoring. Fencing split brain and failover is manual in many cases. Marmot even right now is easier to spin up, plus it's leaderless. So you can actually just have your client talk to different nodes (maybe in round robin fashion) to do load distribution.
- Marmot's eventual consistency + anti-entropy will recover brain-splits with you requiring to do anything. MySQL active active requires manual ops.
- Marmot's designed for read-heavy on the edge scenarios. Once I've completed the read-only replica system you can literally bring up or down lambda nodes with Marmot running as sidecar. With replicas being able to select DBs they want (WIP) you should be able to bring up region/org/scenario specific servers with their light weight copies, and writes will be proxied to main server. Applications are virtually unlimited. Since you can directly read SQLite database, think many small vector databases distributed to edge, or regional configurations, or catalogs.
This seems CDC based, does that mean it handles `now()` and other non-deterministic functions properly?