← Back to context

Comment by bitexploder

1 day ago

I started setting up my workflows using Temporal. It deploys as relatively light weight local app. For an isolated local installation it uses SQLite. It makes the process of dealing with API retries and organizing workflows and tasks really simple. I recommend giving it a try. It is, philosophically, exactly what this article is suggesting, but it adds an incredibly rich and flexible interface for agents to work with. Additionally, the web UI makes it very easy to inspect workflows, review agent execution, etc. Temporal also encodes much higher reliability into your system, almost for free. Distributed and reliable systems are hard, don't reinvent the wheel IMO.

If you find yourself wanting things like an easy way to then introspect your SQLite database, figure out what is happening in the workflow, compose individual tasks, make workflows trivially callable, etc, give Temporal a look.

Alongside this, I have mostly moved away from files for agents. Markdown and JSON are great, but also feel like traps when building out smaller local apps. LLMs are great at SQLite and you can render anything you want out of it (Markdown, JSON, etc). It saves a lot of tokens when an agent can just query a specific row instead of having to fire up jq or grep through markdown. You get a nice portable self contained data management system that encourages agents to be more disciplined about how they structure their data than a bunch of files. It also continues to scale into MySQL/Postgres if your little local projects start to outgrow or become more formal, you already have schema and discipline around data.

It sounds like you’re running this mostly on a single machine? Temporal gets much more complex with scale. Cassandra isn’t fun to manage. Ringpop and TChannel are hard to debug when things go wrong. The SQL backend support doesn’t support horizontally scaled replicas (just single instance) due to consistency requirements. Depending on how your code is written, modifying code baked into workflows becomes complex, as anything that modifies the history event ordering breaks determinism in already-deployed workers.

We use it heavily and everyone who started on it doing simple scripting/automation all love it, everyone who built real production systems on top of it all hate it. Possibly operator error, but my experience hasn’t matched the rosy picture painted in these comments.

  • In the last two years, we built (with a team of 15, now 100) a billion dollar business on top of Temporal that performs business critical applications for fortune 500 companies. We couldn't be happier with temporal.

    Determinism sucks, you do have to work hard and make everything idempotent in activities like we would for durable software anyway. The language we used was incorrect (Go) and has a lot of boilerplate compared to alternatives we later investigated (Python and TypeScript). Visibility can be slow and misses information. We needed to write our own APIs to work effectively with Agents for root-cause analysis of failures.

    With all the caveats - Temporal is amazing, it feels much better than previous orchestrators I used like Prefect or Airflow. 100% would adopt again.

    • That is the real truth people are voicing when they say Temporal is heavy. They are really saying: Durable, reliable, distributed workloads are hard and it takes effort to manage! And that is true. I know of no systems that make that genuinely easy. It is a hard discipline. Maybe Temporal makes that harder than it should be, but I have no experience there.

      There are no free lunches in this space. I have no idea how good or bad Temporal is since my usage is pretty small and isolated, but software rarely just works and impresses me and Temporal for my local machine orchestrating genuinely did. I think Netflix's conductor is another cool option, but I ended up with Temporal due to license.

  • > Depending on how your code is written, modifying code baked into workflows becomes complex, as anything that modifies the history event ordering breaks determinism in already-deployed workers.

    I see this as Temporal surfacing inherent complexity of the domain in a way that forces the developer to consider it, rather than introducing extra complexity.

    If it didn't make workflow determinism a strict requirement, the requirement would still exist - it would just hurt much worse in production when it's broken.

    See also: Rust borrowing

    • Actually Temporal does have a way to avoid determinism called rainbow deployments.

      If you're fine with deploying several versions of workers (and are on a reasonably new version) you can just avoid the determinism issue altogether with their k8s controller.

      If you do need to have some long workflows, there is an explicit hook for "what happens to existing workflows on version upgrade".

      But to be fair - none of the other orchastrators I used (like AirFlow) made me write workflows.IsNewCode/IsOldCode like temporal does. On the other hand AirFlow doesn't even have the capability to do that in the first place (or at least it didn't last I used it).

  • Yep. Individual systems with yolo agents doing stuff in isolation. I could see how it can get complex. Most distributed systems are. No free lunches I guess. I am not sure what the alternatives are at scale.

  • "gets much more complex with scale" feels like the crux of it. Pick the right solution for your intended scale

    That said, I appreciate this is hard in practice. We need to start small to manage the development rabbit hole risk, while also wanting to dream big. There is a tension there that I find hard to balance.

  • Temporal feels massive, I tried it for a small workflow embedded on my system, and worked fine, but when thinking on scaling it, it just didn't made any sense for my use case.

    I also have restate.dev on my reseearch list, which on paper should scale well and be definitely more lightweight and simple to setup, worth having a look.

My current client has a forest of 90+ SnapLogic pipelines that were badly written and maintained even worse; one of those was completely wrong, in that it generated wrong accounting data which could eventually have financial, fiducial and legal repercussions.

I rewrote the pipeline in Python (a correct version of it) with state management in SQLite and logs in plain old flat files, and everything has been running smoothly ever since. In fact this is the only data flow that has worked without errors or interruptions in the last six months.

Instead of replicating the db file with Litestream I do a remote backup with Restic before and after each run; it's not an exact replacement of Litestream as we could possibly lose a whole run if the machine died / disappeared at the end of a run, but it lets one restore any day very easily. In an ideal world I think we should have both (live replica + backups).

Word on HN is that you're either paying more money than you expected for temporal's managed solution or taking on substantial ops burden ultimately running their very heavy system yourself.

I wouldn't know, I've not done either, but I'd like to learn more from your or other's experience.

  • I told an agent to set it up for me for some local stuff. It is written in Go. It has a painless path to run on a local SQLite DB. My agents use it to organize and coordinate workflows. It handles retries and long horizon tasks fine. As far as I can tell for the core workflows and tasks pieces it’s great. MIT license. Like anything it isn’t free to manage but it offers a lot in return. High reliability systems are hard. Temporal only solves some of it. It is far better than rolling it yourself.

    I think a genuine problem right now is people are building agentic work flows and learning the hard way highly reliable agentic work flows are hard. Agents are unreliable. They are both not deterministic and not the backing APIs have pretty high error rates. Temporal has solved that pain for me and made it easy to diagnose problems.

    I don’t have anything really large scale running. But big enough that it takes billions of tokens and high reliability to finish.

  • Could you expand on the "substantial ops burden"? Let's say you're using a managed Postgres instance as the underlying data store, how substantial is the ops burden in that case? I understand that temporal is actually a set of 4 or so microservices on top of a data store, but if you're already running a distributed system backed by k8s or something like that, it doesn't seem like it adds significant incremental ops on top of that. But I could be wrong.

    • I run my own temporal service in my k8s cluster; this setup is the backbone for almost all my applications. For simplicity, I opted for the postgres backend. You still need to run the 4 (?) other service (history, matching, frontend, ui, maybe others, definitely others if you want observability with prometheus/grafana, and tad bit more complexity if you want tailscale to get in there and poke around).

      They ship Helm charts so reality is somewhere between "helm deploy" and "substantial ops burden". I don't have to touch it very frequently, but that is not to say I don't have to touch it. There's occasional releases and there have been times where (probably due to my inexperience with helm) I botched an upgrade and lost some data. And I've been on this journey for years; when I first started, they didn't have a Python SDK and it was one of my (many) excuses to learn Go. But anyway to your point, yes, if you're comfortable with k8s and Helm then you shouldn't have much of a problem running hundreds of thousands of workflows; if you want to really push the throughput and optimize cost you probably need to get creative the individual services and look into cassandra (maybe? idk).

    • As a dev I would tell you its an ops burden.

      My devops coworker just shrugs, pumps out some yaml and helm and away it goes.

      It really depends on your experience and tolerance for a lot of things.

      Usually maintenance burden doesent start to make itself known till you get off the happy path or something breaks. Sometimes it can be a long while before that happens, sometimes it happens right away.

      1 reply →

    • I think it depends a lot on the operational maturity of the company. Some places are running the LGTM observability stack, sentry for error reporting, 24/7 on call rotations, playbooks for all alerts, etc. Those organizations will have less issues running systems like temporal because the operational framework is already there.

      Other orgs have never heard of alerts or error reporting and naturally will not catch issues until they are catastrophic (for example services that crash frequently in the background go unnoticed until the crash frequency causes a catastrophic failure). In my experience a lot of issues are pretty simple such as running out of memory, CPU throttling, crashes caused by simple bugs (nil panics). If you have good observability you can catch those issues early.

      For example: people rag on Ceph that their cluster somehow got into a broken state, but that really only occurs when abuse of the ceph cluster has went on long enough that the cluster finally reaches the tipping point where it is unrecoverable. If you set ceph up, follow the correct replication rules so components are spread across failure domains, and use the metrics and alerts that are distributed with ceph it is actually quite hard to break the cluster.

    • In my experience with a relatively modest number of concurrent workflows (think hundreds) you'll be pushing several thousand transactions per second through that postgres instance.

      As best I can tell it doesn't do any batching of it's writes/reads, and it's update heavy in places rather than append (I suspect their cloud version might do some of these things)

      It's pretty close to "let's make every function call serialise it's parameters/return value, go through a postgres table and several network hops"

      That said it can be very useful, but it's a heavy tool that's best suited for high value/risk workflows where you're earning enough from the execution that you can afford the overhead (for example an Uber trip with several dollars of service fees is probably a good fit, unsurprisingly since it's roots are from Uber)

  • Very heavy indeed, people will confuse the durability that Temporal provide with all the other properties a distributed system needs. They will then think that Temporal will solve all their problems.

  • Their managed solution is pricey and especially the linear scaling with how much you use it is very meh. It's comparable with AWS lambda which also isn't cheap. However it's minor on a typical cloud bill.

    Self-hosting is very easy in my experience, I've done it for 2 years but management wanted to move to Temporal Cloud. They have a helm chart which just works including upgrades. This does assume you have the whole k8s shebang set up and working in your company. I never had to touch is outside upgrades which took maybe 30m including validation.

This reads like an advertisement for Temporal :)

  • I'm someone else who has inherited a bunch of ad-hoc orchestration systems and also used Temporal quite heavily. The latter does certainly come with some overhead (not so bad in the age of LLMs), but it also guides you along a well-trodden path of good practices. The latter being very important - it means that when you want to take on more advanced capabilities, you probably haven't painted yourself into a corner too badly and can take that on fairly easily. Think: retries, multi-tenancy, multi-lang, observability, etc.

  • I just spent the last two weeks digging into workflow state engines and temporal was one of the candidates. It is a VC backed fork of Cadence. The got 0.3B funding and whatever positive I read about them on the net I take with a big spoon of salt. Just my 2 cents.

  • Low key amazing tech, kinda like clickhouse - nobody is bragging it’s running their business

  • I can vouch for them too, being a super early adopter. One of the best early bets I've ever made. Awesome OSS product, glad the team decided to leave Uber to commercialize it.

  • Well, just my experience. I installed it, had my agents configure it and it immediately solved problems I had with very little friction. Dealing with long running, long horizon agentic tasks that need very high reliability so I don’t have to babysit. I vibed the first version, realized I was reinventing reliable distributed systems. Stopped vibing and started surveying for something that fit :)

  • It does, my experience has been that it adds code complexity, deployment complexity, and performance problems. There are some observability benefits, but other ways to solve that. It's possible there are workloads that fit it but not anything I've personally worked on.

Could you give an example of a case where you'd use SQLite instead of jq or grep through Markdown?

  • My favorite lens on SQLite is that it is actually two things:

    1. A robust durability implementation 2. A library of high performance data structure and algorithms

    The fact this it's SQL is nice, but those two attributes are what make it great.

    For example, I'm implement an in-process event log that I want to be durable. I started simple, but soon saw some edge cases and instead of playing whackamole I just swapped to using sqlite as an ordered kv store that gives me ACID.

    Another example: ingesting multiple inter related datasets. Instead of a dozen hash maps in memory, I load them up into sqlite (no persistence) and then slice and dice as I need to.

    It's a super useful tool.

    • mirrors my own experience creating a persistent event log. I started with JSON, then JSONL, etc until finally landing on SQLite.

  • The moment my JSON has any sort of depth and I need to write a parser for it and potentially account for unspecified behavior. JSON's nice when it's nice, but it's terrible when it's terrible. It's 100x easier to write SQL than writing jq and... dear god if I have to use grep -A or -B, I'm doing something wrong. Constraints are actually a good thing!

    The underlying database isn't the most important thing. Just use SQL. Its namespacing (eg, through CTEs) is good and you're more likely to have colleagues who know SQL compared to jq.

    • > It's 100x easier to write SQL than writing jq and... dear god if I have to use grep -A or -B, I'm doing something wrong. Constraints are actually a good thing!

      As an occasional consumer of JSON/CSV, that's why I really like DuckDB, it's just SQL for such file formats. And it manages to be super fast at it too.

  • > an example of a case where you'd use SQLite instead of jq or grep through Markdown?

    Usually we end up writing a script to incrementally refresh a data-set I'm analyzing (or have someone send me a copy after they pull it).

    I've been using sqlite for anything which needs an UPDATE - modifying a row deep inside the data-set with jsonl is a pain.

    My github is full of java programs which update sqlite3 files with threadpools and a single big lock around the UPDATE (& then I write or have an agent write code to analyze it).

    DuckDB is slowly replacing it in the context of python, simply because of the ease of pushing a UDF into the SQL.

    Also because I really like expressing things as LEAD/LAG with a UDF on top.

  • SQLite is more efficient for large data sets. A single markdown or JSON file needs to be streamed to locate a piece of data O(n). Updating an existing entry in a sequential file is even worse because you have to rewrite the file. SQLite has the data structures to quickly find data in O(log n) time.

  • Honest answer is: whenever your markdown or json files get to be big enough that grep/jq takes long enough that you get bored waiting for it.

Interesting about the files vs db approach. I have been going back and fourth. I landed on db as well.

> It saves a lot of tokens when an agent can just query a specific row instead of having to fire up jq or grep through markdown

Just wanted to make sure no one missed this point in your comment because eventually users will be paying the full cost for tokens instead of VC's paying, with GitHub Copilot's pricing realignment leading the way.