Absurd Workflows: Durable Execution with Just Postgres

3 months ago (lucumr.pocoo.org)

38 comments

ingve

I've been keeping an eye on this space for awhile as it matures a bit further. There's been a number of startups that have popped up around this - apart from Temporal and DBOS, Hatchet.run looked interesting.

I've been using BullMQ for awhile with distributed workers across K8 and have hacked together what I need, but a lightweight DAG of some sort on Postgres would be great.

I took a brief look at your docs. What would you say is the main difference of yours vs some of the other options? Just the simplicity of it being a single sql file and a sdk wrapper? Sorry if the docs answer this already - trying to take a quick look between work.

the_mitsuhiko 3 months ago
> I took a brief look at your docs. What would you say is the main difference of yours vs some of the other options? Just the simplicity of it being a single sql file and a sdk wrapper? Sorry if the docs answer this already - trying to take a quick look between work.
It's really just trying to be as simple as possible. I was motivated by trying to just do the most simple thing I could come up with after I did not really find the other solutions to be something I wanted to build on.
I'm sure they are great, but I want to leave the window open to having people self host what we are building / enable us to deploy a cellular architecture later and thus I want to stick to a manageable number of services until until I can no longer. Postgres is a known quantity in my stack and the only postgres only solution was DBOS which unfortunately did not look ready for prime time yet when I tried it. That said, I noticed that DBOS is making quite some progress so I'm somewhat confident that it will eventually get there.
- jedberg 3 months ago
  
  Could you provide some more specifics as to why DBOS isn’t “ready for prime time”? Would love to know what you think is missing!
  FWIW DBOS is already in production at multiple Fortune 500 companies.
  
  3 replies →
- mfrye0 3 months ago
  
  Thanks for this. That makes sense.

phs318u 3 months ago

Wow. Everything old is new again. I built a business state machine for a bespoke application using Oracle 8i and their stateful queues back in 2005. I had re-architected a batch-driven application (which couldn't scale temporally i.e. we had a bunch of CPU sitting near idle for a lot of the time), and turned it into an event driven solution. CPU usage became almost a horizontal line, saving us lots of money as we scaled (for the record, "scale" for this solution was writing 5M records a day into a partitioned table where we kept 13 months of data online, and then billed on it). Durable execution was just one of the many benefits we got out of this architecture. Love it.

the_mitsuhiko 3 months ago

It's quite funny in a way for me because even back in the Cadence days I thought it was the hottest shit ever, but it was just too complex to run for a small company and cadence was not the first (SWF and others came before). It felt like unless you had really large workflows you would ignore these systems entirely. And now, due to the problems that agents pose, we're all in need of that.
I'm happy it's slowly moving towards mass appeal, but I hope we find some simple solutions like Absurd too.

saadatq 3 months ago

Somebody said this the other day on HN, but we really are living in the golden age of Postgres.

rodmena 3 months ago

Armin, I managed to review absurd.sql and the migrations. I am so impressed that I am rewriting the state management of my workflow engine with Absurd. Just wanted to thank you for sharing it with us. I'll keep you posted of the outcome.

rodmena 3 months ago

And here is the Python Client for Absurd:

https://python-absurd-client.readthedocs.io/en/latest/quicks...

rodmena 3 months ago

FYI. Highway DSL now fully supports Absurd. Specs: https://github.com/rodmena-limited/highway_dsl/blob/main/spe...

eximius 3 months ago

This is pretty great! The main thing you need for durable execution is 1) retries (absurd does this) 2) idempotency (absurd does this via steps - but would be better handled with the APIs themselves being idempotent, then not using steps. Though absurd would certainly _help_ mitigate some APIs not being idempotent, but not completely).

the_mitsuhiko 3 months ago
> idempotency (absurd does this via steps - but would be better handled with the APIs themselves being idempotent, then not using steps
That is very hard to do with agents which are just all probabilistic. However if you do have an API that is either idempotent / uses idempotency keys you can derive an idempotency key from the task: const idempotencyKey = `${ctx.taskID}:payment`;
That said: many APIs that support the idempotency-key header only support replays of an hour to 24 hours, so for long running workflows you need to capture the state output anyways.
- eximius 3 months ago
  
  I was not thinking of the agent case specifically. But yes, you have to make the APIs idempotent, either with these step checkpoints or by wrapping the underlying API. It's not hard to make a postgres-transaction-based idempotency layer wrapper, then you can have a much longer idempotency TTL.
  > so for long running workflows you need to capture the state output anyways.
  That would be a _very_ long running workflow. Probably worth breaking up into different subtasks or, I guess as Absurd does it, step checkpoints.

rodmena 3 months ago

I think it's a brilliant idea. Absurd can be a very good match to highway_dsl as well (which is a domain-specific-language, for workflows)

https://github.com/rodmena-limited/highway_dsl?tab=readme-ov...

stevefan1999 3 months ago

Did anyone have a new approach to do this kind of transactional workflow? I heard that Saga patterns also define invertibility as well but I want a more general framework that also does all of this in one.

Also, I noticed how durable execution actually have so much to do with Continuation-passing style, is my intuition correct?

oulipo2 3 months ago

Really cool! How does it compare to DBOS ? https://docs.dbos.dev/architecture

the_mitsuhiko 3 months ago

I'm sure with time DBOS will be great, I just did not have a lot of success with it when I tried it. It's quite complex, the quality of the SDKs was not overly amazing (when I initially used it, it had a ton of dependencies in it) and it just felt early.

immibis 3 months ago

In what sense are these durable, given that they restart from the beginning if the server process crashes?

adammarples 3 months ago
Because the finished steps have their state stored so they don't repeat
- immibis 3 months ago
  
  Except they don't? Look at the example that runs an AI conversation. It always starts from the system prompt. At no point does it load an old conversation from a database.

motoboi 3 months ago

Restate was built for agents before agents were cool.

Surprisingly haven take off yet when agents is all we are looking for now.

andrewstuart 3 months ago

Reminder that Postgres does not have a monopoly on SKIP LOCKED

You can do that in Oracle, SQL server and MySQL too.

In fact you might be able to replicate what Armin is doing with SQLite because it too works just fine as a queue though no via SKIP LOCKED.

oulipo2 3 months ago

Other question: why reimplementing your framework, rather than using an existing agent framework like Claude + MCP, or OpenAI + tool calling? Is it because you're using your own LM models, or just because you wanted more control on retries, etc?

the_mitsuhiko 3 months ago
There are not that many agent frameworks around at the moment. If you want to be provider independent you most likely either use pydantic AI or the vercel AI SDK would be my guess. Neither one have built-in solution for durable execution so you end up driving the loop yourself. So it's not that I don't use these SDKs, it's just that I need to drive the loop myself.
- oulipo2 3 months ago
  
  Okay very clear! I was saying that because your post example is just a kind of basic "tool use" example which is already implemented by MCP/OpenAI tool use, but obviously I guess your code can be suited to more complex scenarios
  Two small questions:
  1. in your README you give this example for durable execution:
  const shipment = await ctx.awaitEvent(`shipment.packed:${params.orderId}`);
  I was just wondering, how does it work? I was more expecting a generator with a `yield` statement to run "long-running tasks" in the background... otherwise is the node runtime keeping the thread running with the await? doesn't this "pile up"?
  2. would your framework be suited to long-running jobs with multiple steps? I have sometimes big jobs running in the background on all of my IoT devices, eg:
  for each d in devices: doSomeWork(d)
  and I'd like to run the big outerloop each hour (say), but only if the previous one is complete (eg max num of workers per task = 1), and that the inner-loop be some "steps" that can be cached, but can be retried if they fail
  would your framework be suited for that? or is that just a simpler use-case for pgmq and I don't need the Absurd framework?
  
  2 replies →
- jedberg 3 months ago
  
  > If you want to be provider independent you most likely either use pydantic AI ... Neither one have built-in solution for durable execution
  PydanticAI has DBOS built in [0].
  [0] https://ai.pydantic.dev/durable_execution/dbos/
  
  3 replies →

crabmusket 3 months ago

Not to be confused with https://github.com/jlongster/absurd-sql (note the hyphenation)

SrslyJosh 3 months ago

Durable execution paired with an unpredictable text generator? Sign me up! /s