Comment by mariano54

1 day ago

Multiple reasons (which also apply to openAIs realtime API): - it's less intelligent than the non voice apis - intelligence degrades even further with lots of context - more expensive - latency is not a free lunch, it comes at the cost of more interruptions from the tutor, which is a really bad UX. We prefer to interrupt less and have higher latency

Also, we prefer the eleven labs voices, but there is definitely varying quality. I'm guessing later this year or next, the voice to voice models will become good enough, and we will switch over.