← Back to context

Comment by nsbk

7 hours ago

I'm putting together a streaming ASR + LLM + streaming TTS setup based on Nvidia speech models: nemotron ASR and magpie TTS, pipecat to glue everything together, plus an LLM of your choice. I added Spanish support using canary models, as magpie models are English-only and it still works really well.

The work is based on a repo by pipecat that I forked and modified to be more comfortable to run (docker compose for the server and client), added Spanish support via canary models, and added Nvidia Ampere support so it can run on my 3090.

The use case is a conversation partner for my gf who is learning Spanish, and it works incredibly well. For LLM I settled with Mistral-Small-3.2-24B-Instruct-2506-Q4_K_S.gguf

https://github.com/nsbk/nemotron-january-2026