Comment by WCSTombs
11 hours ago
If your arrays have more than two dimensions, please consider using Xarray [1], which adds dimension naming to NumPy arrays. Broadcasting and alignment then becomes automatic without needing to transpose, add dummy axes, or anything like that. I believe that alone solves most of the complaints in the article.
Compared to NumPy, Xarray is a little thin in certain areas like linear algebra, but since it's very easy to drop back to NumPy from Xarray, what I've done in the past is add little helper functions for any specific NumPy stuff I need that isn't already included, so I only need to understand the NumPy version of the API well enough one time to write that helper function and its tests. (To be clear, though, the majority of NumPy ufuncs are supported out of the box.)
I'll finish by saying, to contrast with the author, I don't dislike NumPy, but I do find its API and data model to be insufficient for truly multidimensional data. For me three dimensions is the threshold where using Xarray pays off.
Xarray is great. It marries the best of Pandas with Numpy.
Indexing like `da.sel(x=some_x).isel(t=-1).mean(["y", "z"])` makes code so easy to write and understand.
Broadcasting is never ambiguous because dimension names are respected.
It's very good for geospatial data, allowing you to work in multiple CRSs with the same underlying data.
We also use it a lot for Bayesian modeling via Arviz [1], since it makes the extra dimensions you get from sampling your posterior easy to handle.
Finally, you can wrap many arrays into datasets, with common coordinates shared across the arrays. This allows you to select `ds.isel(t=-1)` across every array that has a time dimension.
[1] https://www.arviz.org/en/latest/
Seconded. Xarray has mostly replaced bare NumPy for me and it makes me so much more productive.
Is there anything similar to this for something like Tensorflow, Keras or Pytorch? I haven't used them super recently, but in the past I needed to do all of the things you just described in painful to debug ways.
For Torch, I have come across Named Tensors, which should work in a similar way: https://docs.pytorch.org/docs/stable/named_tensor.html
The docs say that it's a prototype feature, and I think it has been that way for a few years now, so no idea how production-ready it is.
It's a much worse API than Xarrays, it seems like somebody should build it on top of PyTorch.
I really like einops. This works for numpy, pytorch and keras/tensorflow and has easy named transpose, repeat, and eimsum operations.
Same - I’ve been using einops and jaxtyping together pretty extensively recently and it helps a lot for reading/writing multidimensional array code. Also array_api_compat, the API coverage isn’t perfect but it’s pretty satisfying to write code that works for both PyTorch and numpy arrays
https://docs.kidger.site/jaxtyping/
https://data-apis.org/array-api-compat/
For pytorch the analogue is Named Tensors, but it's a provisional feature and not supported everywhere.
https://docs.pytorch.org/docs/stable/named_tensor.html
Thanks for sharing this library. I will give it a try.
For a while I had a feeling that I was perhaps a little crazy for seeming to be only person to really dislike the use of things like ‘array[:, :, None]’ and so forth.
xarray is nice
along those lines, for biosignal processing, NeuroPype[0] also builds on numpy and implements named axes for n-dimensional tensors, with the ability to store per-element data (i.e. channel names, positions, etc.) for each axis
[0] https://www.neuropype.io/docs/
Life goes full circle sometimes. I remember that Numpy roughly came out of the amalgamation of the Numeric and Numarray libraries; I want to imagine that the Numarray people kept fighting these past 20 years to prove they were the right solution, at some point took some money from Elon Musk and renamed to Xarray [0], and finally started beating Numpy.
[0] most of the above is fiction