Comment by ffsm8

2 months ago

> your side effects (e.g. database writes) aren't idempotent

What does idempotent mean in this context, or did you mean atomic/rollback on error?

I'm confused because how could a database write be idempotent in Django? Maybe if it introduced a version on each entity and used that for crdt on writes? But that'd be a significant performance impact, as it couldn't just be a single write anymore, instead they'd have to do it via multiple round trips

In the context of background jobs idempotent means that if your job gets run for a second time (and it will get run for a second time at some point, they all do at-least-once delivery) there aren't any unfortunate side effects to that. Often that's just a case of checking if the relevant database updates have already been done, maybe not firing a push notification in cases of a repeated job.

  • If you need idempotent db writes, then use something like Temporal. You can't really blame Celery for not having that because that is not what Celery aims to be.

    • With Temporal, your activity logic still needs to ensure idempotency e.g. by checking if an event id / idempotency key exists in a table. It's still at-least-once delivery. Temporal does make it easy to mint an idempotency key by concatenating workflow run id and activity id, if you don't have a one provided client-side.

    • Temporal requires a lot more setup than setting up a Redis instance though. That's the only problem with it. And I find the Python API a bit more difficult to grasp. But otherwise a solid piece of technology.

In my experience async job idempotency is implemented as upserts. Insert all job outputs on the first run. Do (mostly) nothing on subsequent runs. Maybe increment a counter or timestamp.