Comment by Aeroi

10 hours ago

I run the gemini live api over a mesh hosted managed webrtc cloud. works fantastic, and Ive been running it for 2 years. you can try websocket, handle ephemeral keys, ect ect. but when you speak with people running voice agents at scale in this space, many of the issues are solved with webRTC and pipecat and the many resources allocated to solved problems in this space. It certainly feels overkill, and it probably is, but once connection is established, it's pretty magical. the startup time and buffering has been solved for quicker voice connections too, https://github.com/pipecat-ai/pipecat-examples/tree/main/ins... (video is harder)