Comment by datsci_est_2015
24 days ago
Might be cool once PySpark integrates with Polars, but for now like many others I’m stuck with dropping into pandas for non-vectorized operations
24 days ago
Might be cool once PySpark integrates with Polars, but for now like many others I’m stuck with dropping into pandas for non-vectorized operations
Is there any plan for this?
Funny enough, I actually just (2 weeks ago) added support for streaming from Pyspark to Polars/DuckDB/etc through Arrow PyCapsule. By streaming, I mean actually streaming, not collecting all data at once. It won't be released probably until May/June but it's there: https://github.com/apache/spark/commit/ecf179c3485ba8bac72af...
Not that I’m aware of. The Spark ecosystem seems a little too “stable” to be putting effort into that kind of development.
Edit: hah, based on the sibling comment, I stand corrected