Comment by freakynit
17 hours ago
So just tested this on a bunch of videos. It's crazy accurate (not 100% obviously), fast and resource efficient.
Not only was I testing the stt part, but also, timestamps and speaker identifications. Tested on 3 different videos, local and online both.
Timestamps were precise down to sub 500-ms level even on longer 20+ minute videos. Speaker identifications worked equally well. My old M1 Air didn't hang a single bit while the transcription was going on.
---
1. Here's one from a single speaker video (https://www.youtube.com/watch?v=-X6YzlY_8tM): https://pastebin.com/vPVPNnne
2. A shorter with up to 4 different speakers and mixed, complex scene/narration changes (https://www.youtube.com/watch?v=4tASl0auPOg): https://pastebin.com/iHZZD8Qe
--
My zsh shorthand: `alias transcribe="yapsnap --timestamps --diarize "`
Reading this means a lot, thank you! Even faster version in the coming days, stay tuned and PRs welcome!
Awesome.. and thanks for building this. Works superb.