Comment by treksis

1 month ago

how fast is this compare to python based?

5 comments

treksis

Very slow currently, I added the benchmarks in the README. To go faster it needs to implement inference faster than the current float32-only kernels.

rcarmo 1 month ago

The Python libraries are themselves written in C/C++, so what this does performance-wise is, at best, cutting through some glue. Don't think about this as a performance-driven implementation.

throwaway314155 1 month ago

PyTorch MPS is about 10x faster per the README.md.

antirez 1 month ago
I cut the difference in speed by half by taking the activations on the GPU. Time to sleep but will continue tomorrow.
- Numerlor 1 month ago
  
  Have you tried e.g. Mojo that can vectorize/do SIMD without having to do intrinsics everywhere?