Comment by treksis 21 days ago how fast is this compare to python based? 5 comments treksis Reply antirez 21 days ago Very slow currently, I added the benchmarks in the README. To go faster it needs to implement inference faster than the current float32-only kernels. rcarmo 21 days ago The Python libraries are themselves written in C/C++, so what this does performance-wise is, at best, cutting through some glue. Don't think about this as a performance-driven implementation. throwaway314155 21 days ago PyTorch MPS is about 10x faster per the README.md. antirez 21 days ago I cut the difference in speed by half by taking the activations on the GPU. Time to sleep but will continue tomorrow. Numerlor 21 days ago Have you tried e.g. Mojo that can vectorize/do SIMD without having to do intrinsics everywhere?
antirez 21 days ago Very slow currently, I added the benchmarks in the README. To go faster it needs to implement inference faster than the current float32-only kernels.
rcarmo 21 days ago The Python libraries are themselves written in C/C++, so what this does performance-wise is, at best, cutting through some glue. Don't think about this as a performance-driven implementation.
throwaway314155 21 days ago PyTorch MPS is about 10x faster per the README.md. antirez 21 days ago I cut the difference in speed by half by taking the activations on the GPU. Time to sleep but will continue tomorrow. Numerlor 21 days ago Have you tried e.g. Mojo that can vectorize/do SIMD without having to do intrinsics everywhere?
antirez 21 days ago I cut the difference in speed by half by taking the activations on the GPU. Time to sleep but will continue tomorrow. Numerlor 21 days ago Have you tried e.g. Mojo that can vectorize/do SIMD without having to do intrinsics everywhere?
Numerlor 21 days ago Have you tried e.g. Mojo that can vectorize/do SIMD without having to do intrinsics everywhere?
Very slow currently, I added the benchmarks in the README. To go faster it needs to implement inference faster than the current float32-only kernels.
The Python libraries are themselves written in C/C++, so what this does performance-wise is, at best, cutting through some glue. Don't think about this as a performance-driven implementation.
PyTorch MPS is about 10x faster per the README.md.
I cut the difference in speed by half by taking the activations on the GPU. Time to sleep but will continue tomorrow.
Have you tried e.g. Mojo that can vectorize/do SIMD without having to do intrinsics everywhere?