Comment by ranger_danger

3 days ago

How is this so much faster than even GPU-based whisper?

1 comment

ranger_danger

small, ONNX-optimized models designed specifically for low-latency CPU streaming, so it avoids overhead of large transformer arch and GPU memory transfers