Comment by spyspy
7 days ago
I'm still convinced the vast majority of kafka implementations could be replaced with `SELECT * FROM mytable ORDER BY timestamp ASC`
7 days ago
I'm still convinced the vast majority of kafka implementations could be replaced with `SELECT * FROM mytable ORDER BY timestamp ASC`
pull vs push. Plus if you start storing the last timestamp so you only select the delta and if you start sharding your db and dealing with complexities of having different time on different tables/replication issues it quickly becomes evident that Kafka is better in this regard.
But yeah, for a lot of implementations you don't need streaming. But for pull based apps you design your architecture differently, some things are a lot easier than it is with DB, some things are harder.
Funny you mention that, because Kafka consumers actually pull messages.
What is the reason for using Kafka then, sorry if I'm missing something fundamental.
1 reply →
Not by busy waiting in a loop on a database query though.
Sure, if you're working on a small homelab with minimal to no processing volume.
The second you approach any kind of scale, this falls apart and/or you end up with a more expensive and worse version of Kafka.
I think there is a wide spectrum between small-homelab and google scale.
I was surprised how far sqlite goes with some sharding on modern SSDs for those in-between scale services/saas
What you're doing is fine for a homelab, or learning. But barring any very specific reason other than just not liking Kafka, its bad. The second that pattern needs to be fanned out to support even 50+ producers/consumers, the overhead and complexity needed to manage already-solved problems becomes a very bad design choice.
Kafka already solves this problem and gives me message durability, near infinite scale out, sharding, delivery guarantees, etc out of the box. I do not care to develop, reshard databases or production-alize this myself.
5 replies →
"Any kind of scale" No, there's a long way of better and more straightforward solutions than the simple SELECT
(SELECT * from EVENTS where TIMESTAMP > LAST_TS LIMIT 50) for example
Yes but try putting that on your CV.
That is exactly what I am doing with sqlite.
Have a table level seqno as monotonically increasing number stamped for every mutation. When a subscriber connects it asks for rows > Subscriber's seqno-last-handled.