Comment by lairv

2 years ago

How do you chose between ClickHouse and DuckDB ? It feels like they solve the same problem

7 comments

lairv

They are both columnar data stores and while they solve the same problem I wouldn't use them in the same situation. DuckDB is often referred as the sqlite of analytics, meaning that it's lightweight and you can embed it. On the other hand ClickHouse is definitely the way to go if you need to distribute your queries over multiple servers. If your workload can be held on a single server and you only need standard SQL functions both will serve you well. If you have more specific needs maybe you should have a look at the documentation. For example ClickHouse has a very extensive support for nested arrays which can prove quite useful.

hodgesrm 2 years ago

Duckdb has also gotten mindshare as an engine to read Parquet from data lakes. The fact that it's embeddable enables some very creative uses. It helped that for a time DuckDB was substantially quicker than ClickHouse on reading Parquet. That advantage has eroded with recent improvements on ClickHouse Parquet support. I expect the gap will close quickly.
tylerhannan 2 years ago

also clickhouse-local exists...https://clickhouse.com/docs/en/operations/utilities/clickhou...
FWIW, you can checkout clickbench.com is a benchmark of parquet, partitioned of ClickHouse and DuckDB

atwong 2 years ago

Scale. DuckDB chokes at a certain point (just like sqlite isn't the same as mysql or postgresql in terms of scalability). That's why they're building a better/bigger version.

qxip 2 years ago

Different beasts, but if by any chance you love ClickHouse already and just want to run OLAP queries in-process, there's chdb: https://github.com/chdb-io/chdb

benjaminwootton 2 years ago

They solve the same problem in that they are OLAP data stores, but that's where the similarity ends. Clickhouse is a centralised OLAP store (like 10s of others) whilst DuckDB is an embedded database that is usually ran in process.

What is it about DuckDB and it's strange cult like following? It's nice that it's in process, but then it's an incremental improvement over Pandas. Nice tool and well implemented but I don't see what is transformative about it.

aadant 2 years ago

ClickHouse power is to have one binary that runs anywhere :
- local - server - cloud (*) - serverless - in-process https://github.com/chdb-io/chdb similar to DuckDB
(*) except for the forked cloud versions, ClickHouse Inc, Huawei, etc ...