Comment by cachius

1 month ago

What's your experience at high speeds, with garbled speech artifacts and pronouncation accuracy?

1 comment

cachius

With supertonic , or overall? If overall most do pretty well though some are funky, like suprano was so bad no matter what I did, so i had to rule that out from my top contenders on anything. supertonic was close to my number one choice for my agentic pipeline as it was soo insanely fast and quality was great, but it didnt have the other bells and whistles like some other models so i held that off for cpu only projects in the future. If you are gonna use it on a GPU I would suggest chatterbox or pocket tts. Chatterbox is my top contender as of now because it sounds amazing, has cloning and i got it down to 0.26 ttfa/ttsa once i quantized it and implemented pipecat in to it. pocket tts is probably my second choice for similar reasons.