Comment by galeaspablo
7 months ago
Let’s separate two advantages in the average case with Kafka.
1. Sequential IO when reading from disk.
2. Use of disk cache (instead of reading from disk) when re-reading recently read events.
#2 helps when you have many consumer groups reading from the tail. And this advantage would extend to index-based streaming.
But #1 would not fully extend to index-based streaming.
When does this matter? When adding a new consumer group you would lose the speed advantage of sequential IO, because it consumes from the beginning (which isn’t in disk cache).
BUT this has become less important now that SSDs are so prevalent and affordable. Additionally, in practice, the bottleneck isn’t in disk IO. Consumers tend to perform IO in other systems that incur O(log n) per insert. Or network cards can get saturated way before disk IO is the limiting factor.
I speculate that we got Kafka et al because we didn’t have such abundance of SSDs in the early 2010’s.
So, returning to your question, you wouldn’t notice the difference in the average case, as long as there are SSDs under the hood.
No comments yet
Contribute on Hacker News ↗