Comment by jokethrowaway
10 days ago
whisper is definitely nice, but it's a bit too slow. Having subtitles and transcription for everything is great - but Nemo Parakeet (pretty much whisper by nvidia) completely changed how I interact with the computer.
It enables dictation that actually works and it's as fast as you can think. I also have a set of scripts which just wait for voice commands and do things. I can pipe the results to an LLM, run commands, synthesize a voice with F5-TTS back and it's like having a local Jarvis.
The main limitation is being english only.
Would you share the scripts?
Or at least more details. Very cool!
Yeah, mind sharing any of the scripts? I looked at the docs briefly, looks like we need to install ALL of nemo to get access to Parakeet? Seems ultra heavy.
You only need the ASR bits -- this is where I got to when I previously looked into running Parakeet:
Then run a transcribe.py script in that venv:
With that I was able to run the model, but I ran out of memory on my lower-spec laptop. I haven't yet got around to running it on my workstation.
You'll need to modify the python script to process the response and output it in a format you can use.
Thanks!