Comment by throwaw12

2 days ago

Curious to know experience of people using DBOS and Temporal.

I have used Temporal in the past, works really good, my only problem with it was some limits on request payload or event sizes, created some inconveniences to us when building solutions. It also enforces good engineering practices, but sometimes you don't want to write special logic if your CSV file is larger than 2Mb, upload it to S3, pass link, then download it in the workflow.

What is your experience with DBOS? How does it compare to Temporal in terms of operational complexity, feature parity and anything else

Haven't used DBOS but use Temporal at current job and used it at previous job as well so I have about 1.5 years under me now. I also run it at home to handle some home automation tasks that aren't super time sensitive (the latency of workflows isn't super bad, but I wouldn't use one for something that is triggered by a motion event in my house unless we're talking about a timeout to turn something off after inactivity).

I really like running a thin rest API in front of it inside your vpc or k8s cluster or whatever to help with event driven triggers so that they don't have to worry about Temporal auth and checking workflow status if there is any decision making around that. This helps keep your event as logic-free as possible.

Let me give a vague example: you have some sort of db trigger, and this trigger either acts directly or puts the event on a queue, your handler calls the thin rest api with the necessary event details, rest API can make the decision if this starts a workflow, signals an existing one, or ignores it (the pattern for this can vary based on the situation, but SignalWithStart is common for me or just dropping if the event is not worthy of starting a workflow and no workflow for that <ItemYouCareAbout> exists).

Then the parent/child workflow ability is very valuable when you need to orchestrate different self-contained behaviors for a single object's lifecycle, with cancellability when an external factor changes the trajectory of an object.

Long, vague story short, I find it very powerful and easy to work with and has really helped move lifecycle logic out of APIs where things can easily become riddled with debt and precarious to manage. I agree with you that it helps follow more best-practices instead of just throwing logic some place that seems easy but becomes a hidden trap later.

I thought Temporal was overly complex, but as you said the best part is it does enforce good engineering practices.

Then I tried their Cloud offering and was appalled at their pricing. I burned through the $1,000 free credits before I even got something to production. Didn't want to bother with running a local Temporal, either.

Best solution is to just take inspiration from their architecture and then do it yourself in Postgres, IMO.

  • That's an interesting take. You didn't want to bother with running a local Temporal, but you are happy to engineer it yourself in Postgres?

    • We already have a Postgres instance running (as I'm sure most stacks have), so it's just another database table rather than a whole new piece of infrastructure that needs to be maintained, with its associated cost, attack surface, risk of Temporal going under or dropping support for OSS, authentication, and other unknowns.

They've just released an external storage approach to solve the large payload issue. I don't 100% love it (it's bolted on, not an intrinsic part), and it's an early release right now - but you can consider this effectively solved for now.

I run a large on-prem temporal setup - throwaway acct as they will likely out me.

Temporal is, in my opinion having run it in prod for over a year - poorly designed, slow and ridicliously heavy infra wise.

If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.

Try running their own benchmarks, the numbers are pathetic.

Their sales team is also absolutely appalling and desperate.

From a Developer standpoint, the SDK is quite nice though.

Don't get trapped into nexus, and if the sales team call you make sure legal is in the room.

  • Honest question: Can you use Temporal Cloud? Have you evaluated Temporal Cloud pricing?

    Ballparking: 200 events/workflow, 200 workflows/per day and assuming 1 event = 1 cloud action[1], that is 1.2M or so actions per month. The $100/month plan includes 1M actions each month, and even the pay-as-you pricing when you exceed that is $50 per 1M actions[2].

    Temporal Cloud seems extremely cheap for your use case, even if I'm off by a factor of 10. Is there a catch? You still need infra to run your Temporal workers, and I assume there are storage and other costs, but I assume action usage is the majority of it.

    1. Not sure exactly what constitutes an "Action". At a glance, seems like most events have a corresponding action(?) and a subset of those actions are actually billable(?)

    2. https://docs.temporal.io/cloud/pricing#payg-action-pricing

    • I was not clear; I did not mean not 200 a day, it's 10s of thousands of concurrently running workflows, sometimes into the hundreds of thousands, each with 200 events. We run many hundreds of thousands of these a day.

      Temporal was a bad fit for us, and we regret it deeply.

      2 replies →

  • > If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.

    Where are the “millions” on infra going? It’s a handful of services and a Postgres?

    > Their sales team is also absolutely appalling and desperate.

    You said “on-prem”. It’s open source; why are you dealing with their sales team?

    > If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day…

    If “millions” were required to obtain such tiny scale, I’d agree there’d be a massive problem. No one would use Temporal; it would be a complete waste of resource. If this were true.

    • We also hit scaling problems with temporal.

      Postgres doesn't scale at all four our workload, so you're into cassandra.

      For a medium sized deployment, you're looking at 200+ vcpus, and then lets say standard dev/uat/prod. So now you're at 600 cpus. Now you need two geographic regions, dev can stay in one place, so now you're at 800. Want a failover cluster for prod? Have another 200 cpus.

      and 200 CPUs is a medium deployment, assuming something like 36 cpus per cassandra node, then say 4-8 per instance of matching, worker, history, frontend. Then all your other components around it, ingress controller, service mesh, etc.

      There's a million a year easy, for a small deployment.

      Our prod one is 4x this size.

      1 reply →

    • Not a couple hundred in one day, a couple hundred being started, concurrently, every second in a day. Each with ~200 events.

      We need a 12 node cassandra cluster for this, with 64cpu nodes. So no, it's not a couple of services and a postgres.

      Sales team, as we are an enterprise, and they want to extract money from us.

      1 reply →

    • The same with any "open-source" enterprise ($$$) software. It sucks to run yourself. Docs on running/errors are non-existent. Their helm charts are broken. Instead of degraded performance, it just fails.

      3 replies →

  • Agree. Have worked in a codebase using Temporal, and is pretty much a nightmare. I don't know about the infra side, but from the developer side, all the abstractions they bring to the table are poorly designed. Wouldn't recommend

    • Biggest design bug imo is the workers need to register for the workflows they support, but will happily pull tasks from unrelated workflows if they're on the same queue. No way to put failed tasks back into the queue again either.

  • > if the sales team call you make sure legal is in the room.

    What's the deal? It couldn't harm just listening to sales, could it?

    I presume legal would it be involved before anything is signed in any case?

DBOS is much less complexity compared to Temporal. That’s the benefit.

Main tradeoff is lower performance. Or at least, you’re going to be limited to what you can push through Postgres. If that’s sufficient for your needs DBOS is great.