Comment by elasticventures
5 days ago
Exactly, datafusion is implied batteries included apache bigdata ecosystem. Polars is chasing the Python Pandas crowd and uses python syntax, handy if you're already comfortable with ipython.
5 days ago
Exactly, datafusion is implied batteries included apache bigdata ecosystem. Polars is chasing the Python Pandas crowd and uses python syntax, handy if you're already comfortable with ipython.
Can't you use DataFusion single node/without any Apache ecosystem stuff? They have a Python library and DataFusion is "just" a query engine. (If anything, I'd call Pandas the batteries included option...)
I think the difference is more that DataFusion is built as a library so you can plug it into the product you're building (e.g. Comet, which plugs it into Spark, or pg_lakehouse, which plugs it into Postgres). Polars could be used that way, but it's also a functional package you can pip install and use as a Pandas alternative right now.
"pg_analytics (formerly named pg_lakehouse) puts DuckDB inside Postgres" https://github.com/paradedb/pg_analytics
It used to use DataFusion. https://www.paradedb.com/blog/iceberg_lakehouse
1 reply →