Comment by lmeyerov
1 year ago
The last point is spot on... Pandas on GPUs (cudf) gets you both the perf + usability, without having to deal with issues common to stack/array languages (k) and lazy languages (dask, polars). My flow is pandas -> cudf -> dask cudf , spark, etc.
More recently, we have been working on GFQL with users at places like banks (graph dataframe query language), where we translate down to tools like pandas & cudf. A big "aha" is that columnar operations are great -- not far from what array languages focus on -- and having a static/dynamic query planner so optimizations around that helps once you hit memory limits. Eg, dask has dynamic DFS reuse of partitions as part of its work stealing. More SQL-y tools like Spark may make plans like that ahead of time. In contrast, that lands more on the user if they stick with pandas or k, eg, manual tiling.
No comments yet
Contribute on Hacker News ↗