Comment by jokethrowaway

10 days ago

whisper is definitely nice, but it's a bit too slow. Having subtitles and transcription for everything is great - but Nemo Parakeet (pretty much whisper by nvidia) completely changed how I interact with the computer.

It enables dictation that actually works and it's as fast as you can think. I also have a set of scripts which just wait for voice commands and do things. I can pipe the results to an LLM, run commands, synthesize a voice with F5-TTS back and it's like having a local Jarvis.

The main limitation is being english only.

5 comments

jokethrowaway

threecheese 10 days ago

Would you share the scripts?

ec109685 10 days ago

Or at least more details. Very cool!

forgingahead 10 days ago

Yeah, mind sharing any of the scripts? I looked at the docs briefly, looks like we need to install ALL of nemo to get access to Parakeet? Seems ultra heavy.

rhdunn 10 days ago

You only need the ASR bits -- this is where I got to when I previously looked into running Parakeet:

    # NeMo does not run on 3.13+
    python3.12 -m venv .venv
    source .venv/bin/activate

    git clone https://github.com/NVIDIA/NeMo.git nemo
    cd nemo

    pip install torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu128
    pip install .[asr]

    deactivate

Then run a transcribe.py script in that venv:

    import os
    import sys
    import nemo.collections.asr as nemo_asr

    model_path = sys.argv[1]
    audio_path = sys.argv[2]

    # Load from a local path...
    asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(restore_path=model_path)

    # Or download from huggingface ('org/model')...
    asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name=model_path)

    output = asr_moel.transcribe([audio_path])
    print(output[0])

With that I was able to run the model, but I ran out of memory on my lower-spec laptop. I haven't yet got around to running it on my workstation.

You'll need to modify the python script to process the response and output it in a format you can use.

forgingahead 4 days ago

Thanks!