Comment by armcat

1 month ago

I have a 5070 in my rig. What I'm running is Kokoro in a Python/FastAPI backend - I also use local quantized models (I swap between ministral-3 and Qwen3) as "the brains" (offload to GPT-5.2 inc. web search for "complex" tasks or those requiring the web). In the backend I use Kokoro and generate wav bytes that I send to the frontend. The frontend is just a simple HTML page with a textbox and a button, invoking a `fetch()`. I type, and it responds back in audio. The round-trip time is <1 second for me, unless it needs to call OpenAI API for "complex" tasks. I am yet to integrate STT as well and then the cycle is complete. That's the stack, and not slow at all, but it depends on your HW.

0 comments

armcat

No comments yet

Contribute on Hacker News ↗