I wonder if AI could create a "commentary" script that instructs the TTS how to read certain words or chapters. The commentary would be like an additional meta-track to help the TTS make the best reading.
That should actually be possible to do already with existing tech. I haven't seen if you can instruct Kokoro to read in a certain way, does anyone know if this is possible?
I wonder if AI could create a "commentary" script that instructs the TTS how to read certain words or chapters. The commentary would be like an additional meta-track to help the TTS make the best reading.
That should actually be possible to do already with existing tech. I haven't seen if you can instruct Kokoro to read in a certain way, does anyone know if this is possible?
Like with almost everything, its an active area of research:
https://emosphere-tts.github.io/
We are getting there
Some of those samples sound like they are emoting in Korean while speaking English.
True, maybe an artifact of the training data, here is another one:
https://www.microsoft.com/en-us/research/project/emoctrl-tts...
Try this one https://www.hume.ai/ - I found the demos (voice to voice) interesting.
Emotion is the acting part of voice acting. Hard to copy with AI