Comment by iLoveOncall
19 hours ago
Chatterbox-TTS has a MUCH MUCH better output quality though, the quality of the output from Sopro TTS (based on the video embedded on GitHub) is absolutely terrible and completely unusable for any serious application, while Chatterbox has incredible outputs.
I have an RTX5090, so not exactly what most consumers will have but still accessible, and it's also very fast, around 2 seconds of audio per 1 second of generation.
Here's an example I just generated (first try, 22 seconds runtime, 14 seconds of generation): https://jumpshare.com/s/Vl92l7Rm0IhiIk0jGors
Here's another one, 20 seconds of generation, 30 seconds of runtime, which clones a voice from a Youtuber (I don't use it for nefarious reasons, it's just for the demo): https://jumpshare.com/s/Y61duHpqvkmNfKr4hGFs with the original source for the voice: https://www.youtube.com/@ArbitorIan
You should try it! I wouldn’t say it’s the best, far from that. But also wouldn’t say it’s terrible. If you have a 5090, then yes, you can run much more powerful models in real time. Chatterbox is a great model though
> But also wouldn’t say it’s terrible.
But you included 3 samples on your GitHub video and they all sound extremely robotic and have very bad artifacts?
[dead]
I've been using Higgs-Audio for a while now as the primary TTS system. How would you say does Chatterbox compare to it if you have experience?
I haven't used it. I compared it with T5Gemma TTS that came out recently and Chatterbox is much better in all aspects, but especially in voice cloning where T5Gemma basically did not work.