Comment by davedx
8 hours ago
> This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster.
On CPU or GPU?
8 hours ago
> This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster.
On CPU or GPU?
No comments yet
Contribute on Hacker News ↗