Comment by ashishbagri
5 days ago
Yes its true that if you just want to send data from Kafka to clickhouse and do not worry about duplicates, then there are several ways. we even covered them in a blog post -> https://www.glassflow.dev/blog/part-1-kafka-to-clickhouse-da...
However, the reason for us to start building this was because duplication is a sad reality in streaming pipelines and the methods to clean up duplicates on clickhouse is not good enough (again covered extensively on our blog with references to cickhouse docs).
The approach you mention about deduplication is 100% accurate. The goal in building this tool is to enable a drop-in node for your pipeline (just as you said) with optimised source and sink connectors for reliability and durability
No comments yet
Contribute on Hacker News ↗