Comment by anonymousiam

10 days ago

Whisper is excellent, but not perfect.

I used Whisper last week to transcribe a phone call. In the transcript, the name of the person I was speaking with (Gem) was alternately transcribed as either "Jim" or "Jem", but never "Gem."

Whisper supports adding a context, and if you're transcribing a phone call, you should probably add "Transcribe this phone call with Gem", in which case it would probably transcribe more correctly.

That's at least as good as a human, though. Getting to "better-than-human" in that situation would probably require lots of potentially-invasive integration to allow the software to make correct inferences about who the speakers are in order to spell their names correctly, or manually supplying context as another respondent mentioned.

  • When she told me her name, I didn't ask her to repeat it, and I got it right through the rest of the call. Whisper didn't, so how is this "at least s good as a human?"

    • I wouldn't expect any transcriber to know that the correct spelling in your case used a G rather than a J - the J is far more common in my experience. "Jim" would be an aberration that could be improved, but substitution "Jem" for "Gem" without any context to suggest the latter would be just fine IMO.