Comment by paddy_m

2 days ago

Have you tried polars? It’s a much more regular syntax. The regular syntax fits well with the lazy execution. It’s very composable for programmatically building queries. And then it’s super fast

I found the biggest benefit of polars is ironically the loss of the thing I thought I would miss most, the index; with pandas there are columns, indices, and multi-indices, whereas with polars, everything is a column, it’s all the same so you can delete a lot of conditionals.

However, I still find myself using pandas for the timestamps, timedeltas, and date offsets, and even still, I need a whole extra column just to hold time zones, since polars maps everything to UTC storage zone, you lose the origin / local TZ which screws up heterogeneous time zone datasets. (And I learned you really need to enforce careful manual thoughtful consideration of time zone replacement vs offsetting at the API level)

Had to write a ton of code to deal with this, I wish polars had explicit separation of local vs storage zones on the Datetime data type

  • I think pandas was so ambitious syntax wise and concept wise. But it got be a bit of a jumble. The index idea in particular is so cool, particular multi-indexes, watching people who really understand it do multi index operations is very cool.

    IMO Polars sets a different goal of what's the most pandas like thing that we can build that is fast (and leaves open the possibility for more optimization), and clean.

    Polars feels like you are obviously manipulating an advanced query engine. Pandas feels like manipulating this squishy datastructure that should be super useful and friendly, but sometimes it does something dumb and slow