← Back to context

Comment by geysersam

3 days ago

> DataFusion has recently overtaken DuckDb in Clickbench results after a community push last year

Really? I don't see it near the top.

[CH benchmarks](https://benchmark.clickhouse.com/#eyjzexn0zw0ionsiqwxsb3leqi...)

Specifically, DataFusion is faster when querying parquet directly.

Most of the leaderboard of ClickBench is for database specific file formats (that you first have to load the data into)

You might need to adjust filters to do an apple to apple comparison.

https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...

  • Not clear why someone need to give up on native duckdb format if it is much faster.

    • Because it means you need to keep another copy of your data in a special format just for DuckDb. The point of Parquet is that it’s an open format queryable by multiple tools. You don’t need to wait to load every table into a new format, you don’t need to retain multiple copies, and you don’t need to keep them in sync.

      If DuckDb is the only query engine in your analytics stack, then it makes sense to use its specialized format. But that’s not the typical Lakehouse use case.

      2 replies →