Comment by imranq

7 months ago

This presentation does a good job distilling why FireDucks is so fast:

https://fireducks-dev.github.io/files/20241003_PyConZA.pdf

The main reasons are

* multithreading

* rewriting base pandas functions like dropna in c++

* in-built compiler to remove unused code

Pretty impressive especially given you import fireducks.pandas as pd instead of import pandas as pd, and you are good to go

However I think if you are using a pandas function that wasn't rewritten, you might not see the speedups

5 comments

imranq

faizshah 7 months ago

It’s not clear to me why this would be faster than polars, duckdb, vaex or clickhouse. They seem to be taking the same approach of multithreading, optimizing the plan, using arrow, optimizing the core functions like group by.

maleldil 7 months ago
None of those drop-in replacements for Pandas. The main draw is "faster without changing your code".
- faizshah 7 months ago
  
  I’m asking more about what techniques did they use to get the performance improvements in the slides.
  They are showing a 20-30% improvement over Polars, Clickhouse and Duckdb. But those 3 tools are SOTA in this area and generally rank near eachother in every benchmark.
  So 20-30% improvement over that cluster makes me interested to know what techniques they are using to achieve that over their peers.
mettamage 7 months ago
Maybe it isn’t? Maybe they just want a fast pandas api?
- geysersam 7 months ago
  
  According to their benchmarks they are faster. Not by a lot, but still significantly.