Comment by ahoka
8 months ago
I think it's just horrible software built on great ideas sold on a false premise (this is a generic message queue and if you don't use this you cannot "scale").
8 months ago
I think it's just horrible software built on great ideas sold on a false premise (this is a generic message queue and if you don't use this you cannot "scale").
It's not just about the scaling, it's about solving the "doing two things" problem.
If you take action a, then action b, your system will throw 500s fairly regularly between those two steps, leaving your user in an inconsistent state. (a = pay money, b = receive item). Re-ordering the steps will just make it break differently.
If you stick both actions into a single event ({userid} paid {money} for {item}) then "two things" has just become "one thing" in your system. The user either paid money for item, or didn't. Your warehouse team can read this list of events to figure out which items to ship, and your payments team can read this list of events to figure out users' balances and owed taxes.
(You could do the one-thing-instead-of-two-things using a DB instead of Kafka, but then you have to invent some kind of pub-sub so that callers know when to check for new events.)
Also it's silly waiting around to see exceptions build up in your dev logs, or for angry customers to reach out via support tickets. When your implementation depends on publishing literal events of what happened, you can spin up side-cars which verify properties of your system in (soft) real-time. One side-car could just read all the ({userid} paid {money} for {item}) events and ({item} has been shipped) events. It's a few lines of code to match those together and all of a sudden you have a monitor of "Whose items haven't been shipped?". Then you can debug-in-bulk (before the customers get angry and reach out) rather than scour the developer logs for individual userIds to try to piece together what happened.
Also, read this thread https://news.ycombinator.com/item?id=43776967 from a day ago, and compare this approach to what's going on in there, with audit trails, soft-deletes and updated_at fields.
I kind of agree on the horrible software bit, but what do you use instead? And can you convince your company to use that, too?
I find that many such systems really just need a scalable messaging system. Use RabbitMQ, Nats, Pub/Sub, ... There are plenty.
Confluent has rather good marketing and when you need messaging but can also gain a persistent, super scalable data store and more, why not use that instead? The obvious answer is: Because there is no one-size-fits-all-solution with no drawbacks.