Comment by nitroedge
3 days ago
For better lip sync you could try using rhubarb to extract from the mp3. What is your backend speech processor so you can get the real-time streaming response? Rhubarb would add a bit of latency for sure.
3 days ago
For better lip sync you could try using rhubarb to extract from the mp3. What is your backend speech processor so you can get the real-time streaming response? Rhubarb would add a bit of latency for sure.
For real-time: we use WebRTC for streaming. Input is streaming STT, then a low-latency LLM, then TTS, then we drive Live2D parameters on the client. Lip sync: we currently do (simple phoneme / amplitude-based) and are testing viseme extraction. Rhubarb is on our list, but we’re cautious about added latency.