Comment by short_sells_poo

2 years ago

I'm curious how is polars group_by_dynamic easier than resample in pandas. In pandas if I want to resample to a monthly frequency anchored to the last business day of the month, I'd write:

> my_df.resample("BME").apply(...)

Done. I don't think it gets any easier than this. Every time I tried something similar with polars, I got bogged down in calendar treatment hell and large and obscure SQL like contraptions.

Edit: original tone was unintentionally combative - apologies.

1 comment

short_sells_poo

n8henrie 2 years ago

Totally fair. And thank you for the rewording (sincerely). I haven't used polars for anything business or finance related, so this is likely one of many blind spots for me.

Reviewing my work, only needed an hourly aggregation, which was similarly easy in polars and pandas (I misspoke about being easier) -- what I found easier was grouping by time data that wasn't amenable to `resample`.

In polars I had no problems using a regular group_by with a pl.col.dt object, whereas in pandas I remember struggling to do so, even though it seemed straightforward.

Sorry, I wish I could remember more details; this was probably 5 years ago that I was writing the pandas code and just converted it to polars about a year ago, so it's possible that I just got better at python in the meantime (though I was writing much more python back then). And of course a rewrite is likely to feel easier the second time.

The other confounding issue is that the eager pandas code crashed with OOM regularly and took several minutes to run, whereas polars handles it very well (which I'm sure to some degree is it optimizing things that I could have done manually), but this made iterating on this codebase feel much less onerous.