Comment by properbrew
1 day ago
I fell down the rabbit hole of voice transcription about a year ago, always had a love for utilising fine tuned LLMs so have put two and two together and built https://news.ycombinator.com/item?id=48385906#48389625
I absolutely love working on this, I still wake up and the first thing I think about is voice transcription pipelines (sad I know), but I'm excited to see how much further performance and utility I can squeeze out.
Are we the same person?? Haha, this is super close to the scope of work I've been doing and just released. Different objectives though. It sounds like yours prioritizes legacy hardware and is more enterprise focused (good for you!). Mine is focused more on long-term project tracking and program management for solo developers or solo builders.
I also got hammered when it came to diarization... I found that the biggest pain was creating an appropriate environment for cross-compatibility of the different backends required for whisper/faster-whisper/pyannote. It's especially challenging on older systems, so major kudos for giving it a shot.
Have you gotten any traction yet from the community?
> Mine is focused more on long-term project tracking and program management for solo developers or solo builders.
This looks very useful, will download and give it a shot later. It took me a few seconds to find it on your page, and only got to the screenshots in the "navigate" sections after clicking through a lot. I would suggest putting a screenshot or something on the landing page so people can see and understand what it is.
Thank you, it's nice to hear someone else has gone through similar pain (in a good way)!
It's been slow and steady, but it's hard. I've commented previously that whilst the cost to build software has plummeted compared to 2 or 3 years ago, the ability to sell it has got harder and I feel this will keep accelerating.
Do you think this can just be used as is to create subtitles for a multi-person video?
How good is it at foreign languages?
I am looking for a nice solution to subtitle some old movies.
Not the Whistle Enterprise software, however I had a friend with a similar requirement so I built Whistle Subtitles, you can download it from:
Linux - https://downloads.blazingbanana.com/whistle-subtitles/unstab...
Windows - https://downloads.blazingbanana.com/whistle-subtitles/unstab...
Mac - https://downloads.blazingbanana.com/whistle-subtitles/unstab...
This was built just for them so I've not spent too much time on the UI (ignore `unstable` in the name, it's just not on a proper release branch) but it's completely free so give it a go if you want. I'm working on the diarisation step so it can tag subtitles to people but that's not ready yet.
It utilises nvidia Parakeet as the ASR model, it is very much European language focused, the supported ones are:
Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainia
If these languages aren't what you're looking for let me know what you need and I'll see what I can do.
I use subtitles extensively for everything I watch, so if I can help someone make older movies more accessible with them then that would make me happy.