Comment by lidavidm
3 days ago
Can't you use DataFusion single node/without any Apache ecosystem stuff? They have a Python library and DataFusion is "just" a query engine. (If anything, I'd call Pandas the batteries included option...)
I think the difference is more that DataFusion is built as a library so you can plug it into the product you're building (e.g. Comet, which plugs it into Spark, or pg_lakehouse, which plugs it into Postgres). Polars could be used that way, but it's also a functional package you can pip install and use as a Pandas alternative right now.
"pg_analytics (formerly named pg_lakehouse) puts DuckDB inside Postgres" https://github.com/paradedb/pg_analytics
It used to use DataFusion. https://www.paradedb.com/blog/iceberg_lakehouse
That's true. We have some more ideas for DataFusion in the works, though... Stay tuned!