Comment by data-ottawa
6 hours ago
Map is one operation pandas does nicely that most other “wrap a fast language” dataframe tools do poorly.
When it feels like you’re writing some external udf thats executed in another environment, it does not feel as nice as throwing in a lambda, even if the lambda is not ideal.
you have map_elements in polars which does exactly this.
https://docs.pola.rs/api/python/dev/reference/expressions/ap...
You can also iter_rows into a lambda if you really want to.
https://docs.pola.rs/api/python/stable/reference/dataframe/a...
Personally I find it extremely rare that I need to do this given Polars expressions are so comprehensive, including when.then.otherwise when all else fails.
That one has a bit more friction than pandas because the return schema requirement -- pandas let's you get away with this bad practice.
It also does batches when you declare scalar outputs, but you can't control the batch size, which usually isn't an issue, but I've run into situations where it is.
the only cases I've really found where I start having to move out of vectorization semantics (numpy, pandas, polars et al), is when I'm contorting so much that I should probably not be vectorizing in the first place. And yes then we're back to slow ol' python which is the motivation for going as far as I can in vector libraries in the first place, sometimes too much. In these cases I think dropping into C/Zig/Rust make more sense but I get it that that is high friction.