Comment by vunderba

8 months ago

Most of these TTS systems tend to fall apart the longer the text - it's a good idea to just wrap any longform text into separate paragraph segmented batches and then stitch them back together again at the end.

I've also found that if your one-shot sample wave isn't really clean that sometimes Chatterbox produces random unholy whooshing sounds at the end of the generated audio which is an added bonus if you're recording Dante's Inferno.