Comment by chasil

7 hours ago

This is the actual problem:

"Kamal runs blue-green deploys — it starts a new container, health-checks it, then stops the old one. During the switchover, both containers are running. Both mount ultrathink_storage. Both have the SQLite files open."

WAL mode requires shared access to System V IPC mapped memory. This is unlikely to work across containers.

In case anybody needs a refresher:

https://en.wikipedia.org/wiki/Shared_memory

https://en.wikipedia.org/wiki/CB_UNIX

https://www.ibm.com/docs/en/aix/7.1.0?topic=operations-syste...

23 comments

chasil

simonw 7 hours ago

Thanks for this, the anecdote with the lost data was very concerning to me.

I think you're exactly right about the WAL shared memory not crossing the container boundary. EDIT: It looks like WAL works fine across Docker boundaries, see https://kamal-deploy.org/docs/upgrading/proxy-changes/ it looks like Kamal 2's new proxy doesn't have this yet, they list "Pausing requests" as "coming soon".

hedora 6 hours ago

Pausing requests then running two sqlites momentarily probably won’t prevent corruption. It might make it less likely and harder to catch in testing.
The easiest approach is to kill sqlite, then start the new one. I’d use a unix lockfile as a last-resort mechanism (assuming the container environment doesn’t somehow break those).

simonw 6 hours ago

I'm saying you pause requests, shut down one of the SQLite containers, start up the other one and un-pause.

Retr0id 6 hours ago

> I think you're exactly right about the WAL shared memory not crossing the container boundary.
I don't, fwiw (so long as all containers are bind mounting the same underlying fs).

simonw 6 hours ago

I just tried an experiment and you're right, WAL mode worked fine across two Docker containers running on the same (macOS) host: https://github.com/simonw/research/tree/main/sqlite-wal-dock...
Could the two containers in the OP have been running on separate filesystems, perhaps?

2 replies →

hedora 6 hours ago

It would explain the corruption:
https://sqlite.org/wal.html
The containers would need to use a path on a shared FS to setup the SHM handle, and, even then, this sounds like the sort of thing you could probably break via arcane misconfiguration.
I agree shm should work in principle though.

1 reply →

chasil 6 hours ago

You might consider taking the database(s) out of WAL mode during a migration.
That would eliminate the need for shared memory.

ncruces 15 minutes ago

This thread in the SQLite forum should be instructive: https://sqlite.org/forum/forumpost/90d6805c7cec827f

gcr 6 hours ago

The SQLite documentation says in strong terms not to do this. https://sqlite.org/howtocorrupt.html#_filesystems_with_broke...
See more: https://sqlite.org/wal.html#concurrency

Retr0id 6 hours ago

They tell you to use a proper FS, which is largely orthogonal to containerization.

jmull 6 hours ago

WAL relies on shared memory, so while a proper FS is necessary, it isn't going to help in this case.

2 replies →

merb 4 hours ago

btw nfs that is mentioned here is fine in sync mode. However that is slow.

PunchyHamster 4 hours ago

> WAL mode requires shared access to System V IPC mapped memory.
Incorrect. It requires access to mmap()
"The wal-index is implemented using an ordinary file that is mmapped for robustness. Early (pre-release) implementations of WAL mode stored the wal-index in volatile shared-memory, such as files created in /dev/shm on Linux or /tmp on other unix systems. The problem with that approach is that processes with a different root directory (changed via chroot) will see different files and hence use different shared memory areas, leading to database corruption."
> This is unlikely to work across containers.
I'd imagine sqlite code would fail if that was the case; in case of k8s at least mounting same storage to 2 containers in most configurations causes K8S to co-locate both pods on same node so it should be fine.
It is far more likely they just fucked up the code and lost data that way...

Retr0id 7 hours ago

> This is unlikely to work across containers.
Why not?

voidfunc 3 hours ago

Ooh new historical Unix variant I had never heard of.. neat!

chasil 3 hours ago

AIX is still supported and sold, so quite current?
Some that I used that are gone... Ultrix (MIPS), Clix, Irix, SunOS 4, SCO OpenServer, TI System V.
https://en.wikipedia.org/wiki/Ultrix
https://en.wikipedia.org/wiki/Intergraph

nxobject 2 hours ago

NeXTstep? (Leaving aside fun spitballing about whether Tahoe is morally OPENSTEP 26, and whether it was NeXT that actually bought Apple for negative $400 million...)

1 reply →