← Back to context

Comment by abelanger

1 month ago

Disclaimer: I'm a co-founder of Hatchet (https://github.com/hatchet-dev/hatchet), which is a Postgres-backed task queue that supports durable execution.

> Because a step transition is just a Postgres write (~1ms) versus an async dispatch from an external orchestrator (~100ms), it means DBOS is 25x faster than AWS Step Functions

Durable execution engines deployed as an external orchestrator will always been slower than direct DB writes, but the 1ms delay versus ~100ms doesn't seem inherent to the orchestrator being external. In the case of Hatchet, pushing work takes ~15ms and invoking the work takes ~1ms if deployed in the same VPC, and 90% of that execution time is on the database. In the best-case, the external orchestrator should take 2x as long to write a step transition (round-trip network call to the orchestrator + database write), so an ideal external orchestrator would be ~2ms of latency here.

There are also some tradeoffs to a library-only mode that aren't discussed. How would work that requires global coordination between workers behave in this model? Let's say, for example, a global rate limit -- you'd ideally want to avoid contention on rate limit rows, assuming they're stored in Postgres, but each worker attempting to acquire a rate limit simultaneously would slow down start time significantly (and place additional load on the DB). Whereas with a single external orchestrator (or leader election), you can significantly increase throughput by acquiring rate limits as part of a push-based assignment process.

The same problem of coordination arises if many workers are competing for the same work -- for example if a machine crashes while doing work, as described in the article. I'm assuming there's some kind of polling happening which uses FOR UPDATE SKIP LOCKED, which concerns me as you start to scale up the number of workers.

You have a great point that scaling a library model requires careful design, but we think that's worth it to provide a superior developer experience.

For example, the DBOS library provides an API to instruct a worker to recover specific tasks. In our hosted platform (DBOS Cloud), when a worker crashes, a central server uses this API to tell the workers what to recover. We like this design because it provides the best of both worlds--the coordination decision is centralized, so it's performant/scalable, but the actual recovery and workflow execution is done in-process, so DBOS doesn't turn your program into a distributed system the way Step Functions/Temporal do (I haven't used Hatchet).

  • Definitely agree that the dev experience is better with a library, particularly for lightweight and low-volume tasks (Hatchet is also moving in the same direction, we'll be releasing library-only mode this month). And I really like the transactional safety built into DBOS!

    My concern is as you start to see higher volume, more workers, or load patterns that don't correspond to your primary API. At that point, a dedicated database and service to orchestrate tasks starts to become more necessary.

    • I disagree. Once you get to the scale that breaks a library based system like DBOS, you need to move away from a central coordinator. At that point your software has to adjust and you have to build your application to work without those types of centralized systems.

      Centralized systems don't scale to the biggest problems no matter how clever you are.

      And usually the best way to scale is to decentralize. Make idempotent updates that can be done more locally and use eventual consistency to keep things aligned.

      3 replies →

Great points. Besides performance, centralized coordination and distributed dataplane is better for operability of schedulers as well. Some examples - Being able to roll out new features in the scheduler, tracing scheduling behavior and decisions, deploying configuration changes.

Even with a centralized scheduler it should be possible to create a DevEx that makes use of decorators to author workflows easily.

We are doing that with Indexify(https://github.com/tensorlakeai/indexify) for authoring data intensive workflows to process unstructured data(documents, videos, etc) - it’s like Spark but uses Python instead of Scala/SQL/UDFs. Indexify’s scheduler is centralized and it uses RocksDB under the hood for persistence, and long term we are moving to a hybrid storage system - S3 for less frequently updated data, and SSD for read cache and frequently updated data(on going tasks).

The scheduler’s latency for scheduling new tasks is consistently under < 800 microseconds on SSDs.

This is how schedulers have been designed traditionally that have a proven track record of working in production - Borg, Hashicorp Nomad, etc. There are many ways a centralized scheduler can scale out beyond a single machine - parallel scheduling across different by sharding jobs, node pools, and then linearizing and deduplicating conflicts during writes is one such approach.

Love DBOs and Hatchet! cheering for you @jedberg and @abelanger :-)

Disclaimer - I am the founder of Tensorlake, and worked on Nomad and Apache Mesos in the past.