Funny enough, I actually just (2 weeks ago) added support for streaming from Pyspark to Polars/DuckDB/etc through Arrow PyCapsule. By streaming, I mean actually streaming, not collecting all data at once. It won't be released probably until May/June but it's there: https://github.com/apache/spark/commit/ecf179c3485ba8bac72af...
Funny enough, I actually just (2 weeks ago) added support for streaming from Pyspark to Polars/DuckDB/etc through Arrow PyCapsule. By streaming, I mean actually streaming, not collecting all data at once. It won't be released probably until May/June but it's there: https://github.com/apache/spark/commit/ecf179c3485ba8bac72af...
Not that I’m aware of. The Spark ecosystem seems a little too “stable” to be putting effort into that kind of development.
Edit: hah, based on the sibling comment, I stand corrected