Comment by __mharrison__
8 hours ago
Great writeup.
I've been in the pandas (and now polars world) for the past 15 years. Staying in the sandbox gets most folks good enough performance. (That's why Python is the language of data science and ML).
I generally teach my clients to reach for numba first. Potentially lots of bang for little buck.
One overlooked area in the article is running on GPUs. Some numpy and pandas (and polars) code can get a big speedup by using GPUs (same code with import change).
Taichi, benchmarked in the article, claims to be able to outperform CUDA at some GPU tasks, although their benchmarks look to be a few years old:
https://github.com/taichi-dev/taichi_benchmark
And doesn't account for cuTitle, NVidia's new API infrastructure that supports writing CUDA directly in Python via a JIT that is based on MLIR.