Comment by ffsm8

2 months ago

> your side effects (e.g. database writes) aren't idempotent

What does idempotent mean in this context, or did you mean atomic/rollback on error?

I'm confused because how could a database write be idempotent in Django? Maybe if it introduced a version on each entity and used that for crdt on writes? But that'd be a significant performance impact, as it couldn't just be a single write anymore, instead they'd have to do it via multiple round trips

6 comments

ffsm8

jon-wood 2 months ago

In the context of background jobs idempotent means that if your job gets run for a second time (and it will get run for a second time at some point, they all do at-least-once delivery) there aren't any unfortunate side effects to that. Often that's just a case of checking if the relevant database updates have already been done, maybe not firing a push notification in cases of a repeated job.

7bit 2 months ago
If you need idempotent db writes, then use something like Temporal. You can't really blame Celery for not having that because that is not what Celery aims to be.
- SkyArrow 2 months ago
  
  With Temporal, your activity logic still needs to ensure idempotency e.g. by checking if an event id / idempotency key exists in a table. It's still at-least-once delivery. Temporal does make it easy to mint an idempotency key by concatenating workflow run id and activity id, if you don't have a one provided client-side.
- hintoftime 2 months ago
  
  Temporal requires a lot more setup than setting up a Redis instance though. That's the only problem with it. And I find the Python API a bit more difficult to grasp. But otherwise a solid piece of technology.

hintoftime 2 months ago

Here is a nice guide from AWS https://docs.aws.amazon.com/wellarchitected/latest/framework...

teaearlgraycold 2 months ago

In my experience async job idempotency is implemented as upserts. Insert all job outputs on the first run. Do (mostly) nothing on subsequent runs. Maybe increment a counter or timestamp.