Comment by ssivark

1 year ago

Setting aside complaints about the Pandas API, it's frustrating that we might see the community of a popular "standard" tool fragment into two or even three ecosystems (for libraries with slightly incompatible APIs) -- seemingly all with the value proposition of "making it faster". Based on the machine learning experience over the last decade, this kind of churn in tooling is somewhat exhausting.

I wonder how much of this is fundamental to the common approach of writing libraries in Python with the processing-heavy parts delegated to C/C++ -- that the expressive parts cannot be fast and the fast parts cannot be expressive. Also, whether Rust (for polars, and other newer generation of libraries) changes this tradeoff substantially enough.

2 comments

ssivark

tgtweak 1 year ago

I think it's a natural path of software life that compatibility often stands in the way of improving the API.

This really does seem like a rare thing that everything speeds up without breaking compatability. If you want a fast revised API for your new project (or to rework your existing one) then you have a solution for that with Polars. If you just want your existing code/workloads to work faster, you have a solution for that now.

It's OK to have a slow, compatible, static codebase to build things on then optimize as-needed.

Trying to "fix" the api would break a ton of existing code, including existing plugins. Orphaning those projects and codebases would be the wrong move, those things take a decade to flesh out.

This really doesn't seem like the worst outcome, and doesn't seem to be creating a huge fragmented mess.

SiempreViernes 1 year ago

> Based on the machine learning experience over the last decade, this kind of churn in tooling is somewhat exhausting.

Don't come to old web-devs with those complains, every single one of them had to write at least one open source javascript library just to create their linkedin account!