Comment by beders
1 month ago
It is important to note that employing any workflow engine needs careful examination of the benefits vs. drawbacks.
You will need full and complete control over workflow orchestration, restarts, deletes, logging, versioning etc.
Your steps will get stuck, they will error out unexpectedly and you will need complete transparency and operational tools to deal with that.
In many cases you will be better off doing the book-keeping yourself. A process should be defined in terms of your domain data. Typically you have statuses (not the best choice, but I digress) that change over time, so you can report on the whole lifecycle of your shopping cart.
Now, and this is crucial, how you implement state changes - any state change - should exactly be the same for a "workflow" than for a non-workflow! It needs to be. A shipment is either not ready yet or done - this information should not be in a "workflow state".
Let's say you shut down your system and start it back up: do you have a mechanism in place that can "continue where it left off"? If so, you likely don't need a workflow engine.
In our case, on startup, we need to query the system for carts that are waiting for shipments and that are not being dealt with yet. Then fan out tasks for those.
That is robust in the face of changed data. If you employ a workflow engine, changed data always needs to two consider two worlds: your own domain data and any book-keeping that is potentially done in any workflow.
Building a complex stateful system will always be hard, but workflows as an abstraction have two big benefits:
1. Automatic handling of any transient failures or service interruptions/crashes/restarts. Transient failures in steps are automatically retried, and service interruptions are automatically recovered from. Even if you're doing your own bookkeeping, doing _recovery_ from that bookkeeping isn't easy, and workflows do it automatically.
2. Built-in observability. Workflows naturally support built-in observability and tooling that isn't easy to build yourself. DBOS integrates with OpenTelemetry, automatically generating complete traces of your workflows and giving you a dashboard to view them from (and because it's all OTel, you can also feed the traces into your existing obs infrastructure). So it's easier to spot a step getting stuck or failing unexpectedly.
Another advantage of DBOS specifically, versus other workflow engines, is that all its bookkeeping is in Postgres and well-documented (https://docs.dbos.dev/explanations/system-tables), so you have full and complete control over your workflows if you need it.
This a thousand times over.
Workflow engines are often complex beasts and rasterizing your business logic into them is nearly always a mistake unless your logic is very simple, in which case why do you need a workflow engine?
They certainly do often make sense when the alternative is building your own generic workflow engine.
Usually I see folks grasp for a workflow system because they don't want to think about or really understand their business logic. They're looking for a silver bullet.
Grounding to reality is key. There’s often a tendency to trust the map over the terrain. This architecture seems to promote relying on the database always have a perfect representation of state. But consider a scenario where a company restores a 12-hour-old backup, losing hours of state. States that can’t be externally revalidated in such cases are a serious concern.
It's an excellent point. In such a scenario, if you're bound to a rigid workflow system, you will probably have a hard time recreating all the intermediary steps required to get the system back into a consistent state with the external world.
Idempotency is key and the choice of the idempotency key as well ;)