Comment by lidavidm

3 days ago

Can't you use DataFusion single node/without any Apache ecosystem stuff? They have a Python library and DataFusion is "just" a query engine. (If anything, I'd call Pandas the batteries included option...)

I think the difference is more that DataFusion is built as a library so you can plug it into the product you're building (e.g. Comet, which plugs it into Spark, or pg_lakehouse, which plugs it into Postgres). Polars could be used that way, but it's also a functional package you can pip install and use as a Pandas alternative right now.