Comment by nsbk

1 month ago

I'm putting together a streaming ASR + LLM + streaming TTS setup based on Nvidia speech models: nemotron ASR and magpie TTS, pipecat to glue everything together, plus an LLM of your choice. I added Spanish support using canary models, as magpie models are English-only and it still works really well.

The work is based on a repo by pipecat that I forked and modified to be more comfortable to run (docker compose for the server and client), added Spanish support via canary models, and added Nvidia Ampere support so it can run on my 3090.

The use case is a conversation partner for my gf who is learning Spanish, and it works incredibly well. For LLM I settled with Mistral-Small-3.2-24B-Instruct-2506-Q4_K_S.gguf

https://github.com/nsbk/nemotron-january-2026

1 comment

nsbk

nsbk 1 month ago

I got the models all the way around. Nemotron-speech ASR is the one that is English-only. Magpie TTS is multilingual and can do both English and Spanish