Comment by adrian17

2 days ago

Any explanation what makes it faster than pandas and polars would be nice (at least something more concrete than "leverage the C engine").

My easy guess is that compared to pandas, it's multi-threaded by default, which makes for an easy perf win. But even then, 130-200x feels extreme for a simple sum/mean benchmark. I see they are also doing lazy evaluation and some MLIR/LLVM based JIT work, which is probably enough to get an edge over polars; though its wins over DuckDB _and_ Clickhouse are also surprising out of nowhere.

Also, I thought one of the reasons for Polars's API was that Pandas API is way harder to retrofit lazy evaluation to, so I'm curious how they did that.