Comment by jvican

1 month ago

Is there any plan for this?

2 comments

jvican

Funny enough, I actually just (2 weeks ago) added support for streaming from Pyspark to Polars/DuckDB/etc through Arrow PyCapsule. By streaming, I mean actually streaming, not collecting all data at once. It won't be released probably until May/June but it's there: https://github.com/apache/spark/commit/ecf179c3485ba8bac72af...

datsci_est_2015 1 month ago

Not that I’m aware of. The Spark ecosystem seems a little too “stable” to be putting effort into that kind of development.

Edit: hah, based on the sibling comment, I stand corrected