I've heard that, but that doesn't sound like a useful approach for videos where (1) non-speech segments can have plenty of other sound (music, noise) and (2) you want timestamps to match up with the original video, like for subtitles. But maybe there are known mitigations for both of those issues that I'm not aware of. And if they do exist maybe they can be included in the ffmpeg whisper integration.
By "delete", people mostly mean "detect", so that you can avoid processing such segments through Whisper. There's no reason to actually cut the silence out from the original audio file.
I've heard that, but that doesn't sound like a useful approach for videos where (1) non-speech segments can have plenty of other sound (music, noise) and (2) you want timestamps to match up with the original video, like for subtitles. But maybe there are known mitigations for both of those issues that I'm not aware of. And if they do exist maybe they can be included in the ffmpeg whisper integration.
By "delete", people mostly mean "detect", so that you can avoid processing such segments through Whisper. There's no reason to actually cut the silence out from the original audio file.
This is designed for real time use too. And in such cases, you couldn’t delete the silence before use.
The ffmpeg implementation might be the example was not.