← Back to context

Comment by chatmasta

3 days ago

They’re similar, but DuckDb is more of a batteries-included database whereas DataFusion is an embeddable query engine. You can use DuckDb in embedded-ish scenarios, but it’s not primarily targeting that use case. To put it another way, DataFusion is sometimes described as “the LLVM of databases.”

Another difference is that DuckDb is written in C++ whereas DataFusion is in Rust, so all the usual memory-safety and performance arguments apply. In fact DataFusion has recently overtaken DuckDb in Clickbench results after a community push last year to optimize its performance.

We tried both about 8 months ago, at the time DuckDB’s Node driver leaked memory and segfaulted, DataFusion was missing some features we wanted. But they are both improving rapidly.

> DataFusion has recently overtaken DuckDb in Clickbench results after a community push last year

Really? I don't see it near the top.

[CH benchmarks](https://benchmark.clickhouse.com/#eyjzexn0zw0ionsiqwxsb3leqi...)