Comment by citguru

3 days ago

Yes, this is actually one of the core problems OpenDuck's architecture addresses.

The short version: OpenDuck interposes a differential storage layer between DuckDB and the underlying file. DuckDB still sees a normal file (via FUSE on Linux or an in-process FileSystem on any platform), but underneath, writes go to append-only layers and reads are resolved by overlaying those layers newest-first. Sealing a layer creates an immutable snapshot.

This gives you:

Many concurrent readers: each reader opens a snapshot, which is a frozen, consistent view of the database. They don't touch the writer's active layer at all. No locks contended.

One serialized write path: multiple clients can submit writes, but they're ordered through a single gateway/primary rather than racing on the same file. This is intentional: DuckDB's storage engine was never designed for multi-process byte-level writes, and pretending otherwise leads to corruption. Instead, OpenDuck serializes mutations at a higher level and gives you safe concurrency via snapshots.

So for your specific scenario — one process writing while you want to quickly inspect or query the DB from the CLI — you'd be able to open a read-only snapshot mount (or attach with ?snapshot=<uuid>) from a second process and query freely. The writer keeps going, new snapshots appear as checkpoints seal, and readers can pick up the latest snapshot whenever they're ready.

It's not unconstrained multi-writer OLTP (that's an explicit non-goal), but it does solve the "I literally cannot even read the database while another process has it open" problem that makes DuckDB painful in practice.

0 comments

citguru

No comments yet

Contribute on Hacker News ↗