Comment by d3m0t3p

1 year ago

You might want to check pola.rs then, it's backed by the appache arrow memory models and it's written in rust. All the columns have a defined type and you can easily catch a mistake when loading data

4 comments

d3m0t3p

lmeyerov 1 year ago

Unless I'm misunderstanding, Arrow solves the data representation on disk/memory, both for pandas and polars, while I'm writing about type inferencing during static analysis, which Arrow doesn't solve.

Having a type checking system respect arrow schemas is indeed our ideal. Will polars during mypy static type checking invocations catch something like `df.this_col_is_missing` as an error? If so, that's what we want, that's great!

FWIW, we donated some of the first versions of what became apache arrow ;-)

benrutter 1 year ago

I've been hunting down column level typing for a while and did not realise polars had this! That's an absolute game changer, especially if it could cover things like nullability, uniqueness etc.

bobbylarrybobby 1 year ago

It's not static, it's basically the same as pandas. Your editor will not know the type of a given column or whether it even exists; all of that happens at runtime.

ledauphin 1 year ago

do you have a reference for how to use static typing for polars columns? I haven't seen this in their docs...