Comment by ranger_danger 3 days ago How is this so much faster than even GPU-based whisper? 1 comment ranger_danger Reply mrkn1 3 days ago small, ONNX-optimized models designed specifically for low-latency CPU streaming, so it avoids overhead of large transformer arch and GPU memory transfers
mrkn1 3 days ago small, ONNX-optimized models designed specifically for low-latency CPU streaming, so it avoids overhead of large transformer arch and GPU memory transfers
small, ONNX-optimized models designed specifically for low-latency CPU streaming, so it avoids overhead of large transformer arch and GPU memory transfers