← Back to context

Comment by taminka

10 days ago

whisper is great, i wonder why youtube's auto generated subs are still so bad? even the smallest whisper is way better than google's solution? is it licensing issue? harder to deploy at scale?

I believe youtube still uses 40 mel-scale vectors as feature data, whisper uses 80 (which provides finer spectral detail but is computationally more intensive to process naturally, but modern hardware allows for that)

You’d think they’d use the better model for at least videos that have a large view counts (they already do that when deciding compression optimizations).