Comment by armcat

2 months ago

Super nice! I've been using Kokoro locally, which is 82M parameters and runs (and sounds) amazing! https://huggingface.co/hexgrad/Kokoro-82M

3 comments

armcat

machiaweliczny 2 months ago

BTW does anyone know of good assistant voice stack that's Open Source? I used https://github.com/ricky0123/vad for voice activation -> works good, then just using Web Speech API as that's the fastest and then commercial TTS for speed as couldn't find good one.

machiaweliczny 2 months ago

I tried Kokoro-JS that I think runs in browser and it was too way too slow with latency also not supporting language I wanted

armcat 1 month ago

I have a 5070 in my rig. What I'm running is Kokoro in a Python/FastAPI backend - I also use local quantized models (I swap between ministral-3 and Qwen3) as "the brains" (offload to GPT-5.2 inc. web search for "complex" tasks or those requiring the web). In the backend I use Kokoro and generate wav bytes that I send to the frontend. The frontend is just a simple HTML page with a textbox and a button, invoking a `fetch()`. I type, and it responds back in audio. The round-trip time is <1 second for me, unless it needs to call OpenAI API for "complex" tasks. I am yet to integrate STT as well and then the cycle is complete. That's the stack, and not slow at all, but it depends on your HW.