← Back to context

Comment by armcat

1 day ago

Super nice! I've been using Kokoro locally, which is 82M parameters and runs (and sounds) amazing! https://huggingface.co/hexgrad/Kokoro-82M

I tried Kokoro-JS that I think runs in browser and it was too way too slow with latency also not supporting language I wanted

  • I have a 5070 in my rig. What I'm running is Kokoro in a Python/FastAPI backend - I also use local quantized models (I swap between ministral-3 and Qwen3) as "the brains" (offload to GPT-5.2 inc. web search for "complex" tasks or those requiring the web). In the backend I use Kokoro and generate wav bytes that I send to the frontend. The frontend is just a simple HTML page with a textbox and a button, invoking a `fetch()`. I type, and it responds back in audio. The round-trip time is <1 second for me, unless it needs to call OpenAI API for "complex" tasks. I am yet to integrate STT as well and then the cycle is complete. That's the stack, and not slow at all, but it depends on your HW.