Comment by ineedasername
3 days ago
The emotional exaggeration is interesting, though I don't think I've come across anything quite so versatile and easy to "sculpt" as Elevenlabs and it's ability to generate a voice on the basis of a description of how you want the voice to sound. SparkTTS allows some additional parameters, and it's project on GitHub has placeholders in its code that indicate the model might be refined for more fine grained emotional control. As it is, I've had some success with it and other models by trying to influence prosody and tonality with some heavy handed queues in the text, which can then be used with VC to get closer to desired results, but it's a much more cumbersome process than Eleven.
No comments yet
Contribute on Hacker News ↗