Comment by 42lux

10 days ago

You usually delete silence before using something like whisper.

5 comments

42lux

I've heard that, but that doesn't sound like a useful approach for videos where (1) non-speech segments can have plenty of other sound (music, noise) and (2) you want timestamps to match up with the original video, like for subtitles. But maybe there are known mitigations for both of those issues that I'm not aware of. And if they do exist maybe they can be included in the ffmpeg whisper integration.

miki123211 10 days ago
By "delete", people mostly mean "detect", so that you can avoid processing such segments through Whisper. There's no reason to actually cut the silence out from the original audio file.

hnlmorg 10 days ago

This is designed for real time use too. And in such cases, you couldn’t delete the silence before use.

42lux 10 days ago

The ffmpeg implementation might be the example was not.