Comment by nitroedge
13 days ago
For better lip sync you could try using rhubarb to extract from the mp3. What is your backend speech processor so you can get the real-time streaming response? Rhubarb would add a bit of latency for sure.
13 days ago
For better lip sync you could try using rhubarb to extract from the mp3. What is your backend speech processor so you can get the real-time streaming response? Rhubarb would add a bit of latency for sure.
For real-time: we use WebRTC for streaming. Input is streaming STT, then a low-latency LLM, then TTS, then we drive Live2D parameters on the client. Lip sync: we currently do (simple phoneme / amplitude-based) and are testing viseme extraction. Rhubarb is on our list, but we’re cautious about added latency.