Comment by spyspy

3 months ago

I'm still convinced the vast majority of kafka implementations could be replaced with `SELECT * FROM mytable ORDER BY timestamp ASC`

16 comments

spyspy

Romario77 3 months ago

pull vs push. Plus if you start storing the last timestamp so you only select the delta and if you start sharding your db and dealing with complexities of having different time on different tables/replication issues it quickly becomes evident that Kafka is better in this regard.

But yeah, for a lot of implementations you don't need streaming. But for pull based apps you design your architecture differently, some things are a lot easier than it is with DB, some things are harder.

ahoka 3 months ago
Funny you mention that, because Kafka consumers actually pull messages.
- politelemon 3 months ago
  
  What is the reason for using Kafka then, sorry if I'm missing something fundamental.
  
  1 reply →
- ycombinatrix 3 months ago
  
  Not by busy waiting in a loop on a database query though.

fatal94 3 months ago

Sure, if you're working on a small homelab with minimal to no processing volume.

The second you approach any kind of scale, this falls apart and/or you end up with a more expensive and worse version of Kafka.

devnull3 3 months ago
I think there is a wide spectrum between small-homelab and google scale.
I was surprised how far sqlite goes with some sharding on modern SSDs for those in-between scale services/saas
- fatal94 3 months ago
  
  What you're doing is fine for a homelab, or learning. But barring any very specific reason other than just not liking Kafka, its bad. The second that pattern needs to be fanned out to support even 50+ producers/consumers, the overhead and complexity needed to manage already-solved problems becomes a very bad design choice.
  Kafka already solves this problem and gives me message durability, near infinite scale out, sharding, delivery guarantees, etc out of the box. I do not care to develop, reshard databases or production-alize this myself.
  
  5 replies →
raverbashing 3 months ago

"Any kind of scale" No, there's a long way of better and more straightforward solutions than the simple SELECT
(SELECT * from EVENTS where TIMESTAMP > LAST_TS LIMIT 50) for example

hawk_ 3 months ago

Yes but try putting that on your CV.

devnull3 3 months ago

That is exactly what I am doing with sqlite.

Have a table level seqno as monotonically increasing number stamped for every mutation. When a subscriber connects it asks for rows > Subscriber's seqno-last-handled.