Comment by AustinDev

8 days ago

Audio models are also tiny, which is probably why small labs are doing well in the space. I run a LoRA'd Whisper v3 Large for a client. We can fit 4 versions of the model in memory at once on a ~$1/hr A10 and have half the VRAM leftover.

Each of the LoRA tunes we did took maybe 2-3 hours on the same A10 instance.

2 comments

AustinDev

freedomben 8 days ago

Is Whisper still getting nontrivial development? I was under the impression that it had stagnated, but it seems hard to find more than just rumors

AustinDev 8 days ago

My ~1.7% WER and faster than realtime processing in my application make it more than adequate. My application is multi-speaker with WPM rates >300 for long durations.