Comment by ericmcer
7 hours ago
I see quite a few of these, the killer feature to me will be one that fine tunes the model based on your own voice.
E.G. if your name is `Donold` (pronounced like Donald) there is not a transcription model in existence that will transcribe your name correctly. That means forget inputting your name or email ever, it will never output it correctly.
Combine that with any subtleties of speech you have, or industry jargon you frequently use and you will have a much more useful tool.
We have a ton of options for "predict the most common word that matches this audio data" but I haven't found any "predict MY most common word" setups.
Whisper supports a prompt, you can put your "Donold" there.
https://developers.openai.com/cookbook/examples/whisper_prom...
I've found the "corrections" feature works well for most of the jargon and misspelling use cases. Can you give it a try and let me know edge cases?
My experience is that Aqua voice does a good job of this with custom dictionary and replacements.