Comment by davedx
2 months ago
> This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster.
On CPU or GPU?
2 months ago
> This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster.
On CPU or GPU?
This is NumPy we are discussing. It doesn't use the GPU.
To be fair, you could replace `import numpy as np` with `import cupy as np` and it would run on GPU without further changes. It is not any good though. PyTorch is roughly 12 times faster.