Comment by AustinDev
8 days ago
Audio models are also tiny, which is probably why small labs are doing well in the space. I run a LoRA'd Whisper v3 Large for a client. We can fit 4 versions of the model in memory at once on a ~$1/hr A10 and have half the VRAM leftover.
Each of the LoRA tunes we did took maybe 2-3 hours on the same A10 instance.
Is Whisper still getting nontrivial development? I was under the impression that it had stagnated, but it seems hard to find more than just rumors
My ~1.7% WER and faster than realtime processing in my application make it more than adequate. My application is multi-speaker with WPM rates >300 for long durations.