Comment by jpalomaki
2 months ago
Learned once the hard way that it makes sense to use "flock" to prevent overlapping executions of frequently running jobs. Server started to slow down, my monitoring jobs started piling, causing server to slow down even more.
*/5 * * * * flock -n /var/lock/myjob.lock /usr/local/bin/myjob.sh
We can also use systemd timer and it ensures there is no overlap.
https://avilpage.com/2024/08/guide-systemd-timer-cronjob.htm...
Have you tested how this behaves on eventually consistent cloud storage?
I'm confused, is EBS eventually consistent? I assume that it's strongly consistent as otherwise a lot of other linux things would break
If you're thinking about using NFS, why would you want to distribute your locks across other machines?
Why would anyone want a distributed lock?
Sometimes certain containerized processes need to run according to a schedule, but maintainers also need a way to run them manually without the scheduled processing running or starting concurrently. A shared FS seems like the ”simplest thing that could possibly work” distribution method for locks intended for that purpose, but unfortunately not all cloud storage volumes are strongly consistent, even to the same user, and may take several ms for the lock to take hold.
2 replies →
If a file system implements lock/unlock functions precisely to the spec, it should be fully consistent for the file/directory that is being locked. Does not matter if the file system is local or remote.
In other words, it's not the author's problem. It's the problem of a particular storage that may decide to throw the spec out of the window. But even in an eventually consistent file system, the manufacturer is better off ensuring that the locking semantics is fully consistent as per the spec.