Comment by spudlyo

4 days ago

So, this project consists of a ~175 line README and a ~500 line Python program that glues yt-dlp and Kroko together. Neat.

I guess if it encourages you to install and figure out how to use ffmpeg, yt-dlp, kroko, numpy, and onnx that's a good thing. Sometimes just knowing a thing is possible is a huge benefit.

I see the value as a centralized anti-content-blocker.

This repo is now a good way to centralize hacks around the sure-to-come blockers those platforms will add to prevent download.

Just like uBlockOrigin was a way to centralize all the "just run this greasemonkey script" comments, I can see this getting a huge following for people who really value transcriptions.

  • I appreciate the perspective! higher ceiling than I'd put on it, but if it gets there awesome. PRs welcome!

thank you. You nailed the actual value, that's right. The real win is just knowing you can do this on a laptop CPU, offline, no GPU or cloud bill. There are tiny done-for-you details, like rescaling token timestamps back to real time after the atempo speedup so --timestamps doesn't lie to you, but they are minor.

  • Why the choice of Kroko over something like parakeet-tdt-0.6b-v3, which is also faster than realtime on CPU?

    • Kroko models are more accurate and their size is just a hundred megabytes compared to parakeet (2.5 gigabytes in default fp32)

      2 replies →