Comment by kalendos

3 days ago

You might need to adjust filters to do an apple to apple comparison.

https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...

Not clear why someone need to give up on native duckdb format if it is much faster.

  • Because it means you need to keep another copy of your data in a special format just for DuckDb. The point of Parquet is that it’s an open format queryable by multiple tools. You don’t need to wait to load every table into a new format, you don’t need to retain multiple copies, and you don’t need to keep them in sync.

    If DuckDb is the only query engine in your analytics stack, then it makes sense to use its specialized format. But that’s not the typical Lakehouse use case.

    • > But that’s not the typical Lakehouse use case.

      that benchmark is also not typical lakehouse use case, since all data is hosted locally, so they don't test significant component of the stack.

      1 reply →