Comment by amelius
8 months ago
I tried some TTS models a while ago, but I noticed that none of them allowed to put markup statements in the text. For example, it would be nice to do something like:
Hey look! [enthusiastic] Should we tell the others? Maybe not ... [giggles]
etc.
In fact, I think this kind of thing is absolutely necessary if you want to use this to replace a voice actor.
Eleven labs has some models with support for that.
https://elevenlabs.io/blog/v3-audiotags