Comment by shrx

10 days ago

As a hard of hearing person, I can now download any video from the internet (e.g. youtube) and generate subtitles on the fly, not having to struggle to understand badly recorded or unintelligible speech.

IF the dialog is badly recorded or unintelligible speech, how would a transcription process get it correct?

  • Because it can use the full set of information of the audio - people with hearing difficulties cannot. Also interesting, people with perfectly functional hearing, but whom have "software" bugs (i.e. I find it extremely hard to process voices with significant background nose) can also benefit :)

    • I have that issue as well - I can hear faint noises OK but if there's background noise I can't understand what people say. But I'm pretty sure there's a physical issue at the root of it in my case. The problem showed up after several practice sessions with a band whose guitarist insisted on always playing at full volume.

      8 replies →

  • The definition of "unintelligible" varies by person, especially by accent. Like, I got no problem with understanding the average person from Germany... but someone from the deep backwaters of Saxony, forget about that.

I did this as recently as today, for that reason, using ffmpeg and whisper.cpp. But not on the fly. I ran it on a few videos to generate VTT files.