← Back to context

Comment by chriswep

6 hours ago

In my tests this doesn't come close to the years old coqui/XTTS-v2. It has great voice cloning capabilities and creates rich speech with emotions with low latency. I tried out several local-TTS projects over the years but i'm somewhat confused that nothing seems to be able to match coqui despite the leaps that we see in other areas of AI. Can somebody with more knowledge in this field explain why that might be? Or am i completely missing something?