← Back to context

Comment by nlitened

3 hours ago

I think STM looks pretty cool in toy examples, but in practice it's pretty bad. Very difficult for me to make strong logical argument about this, just based on how it feels.

In Clojure, there are first-class STM primitives with retriable transactions in the standard lib, but every time I tried to use them, the code was slow, difficult to reason about, full of "colored" functions, and non-composable.

Also, unintuitively, the code starts immediately smelling slightly "wrong", as if I tried to put into code my first childish thoughts instead of actually thinking about how the system _should_ work. Like the notorious bank accounts example: yeah, cool, two accounts' balances are TVars — but what if I have 10M accounts? Is the list of accounts also a TVar since sometimes new accounts are created? Are indexes on the list also TVars? How do you persist the lists, how do you synchronize slow persistence and in-memory transactions? How do you synchronize all the deposits with a backup/failover machine? As you continue trying to untangle all this with STM, you start drowning in complexity, and encountering huge contention among transactions touching many parts of your systems, and spewing countless retry logs.

It's not only my own experience — I believe it's widely accepted in Clojure community that nobody actually uses STM, and instead uses simple atomic updates or queues.

I have a similar experience, though it's not necessarily related to shortcomings in the STM implementation (which I always felt was quite competent in the space). To expand, I think a few things are true:

1. I've come to feel that actually managing concurrency is a "below the fold" competency for most engineers. I'm not saying it should be, and people need to understand what a data race is. 2. As you pointed out, most state in services that I work on is not in need of coordinated updates, so atoms are where you want to be, or single-writer queues. 100% agreed 3. Structured concurrency approaches, fibers, and other abstracted runtimes seem to be the best way to get concurrent and parallel work, given points 1 and 2 about how coordinated state really isn't that common for most work on my team

A lot of times when we're updating state we're more having to think about it in the database, employing either a last-write-wins strategy or when coordination is in need, some form of pessimistic or optimistic locking.

Thanks, this is very valuable! Have you tried other implementations of STM such as Haskell's?