Comment by yorwba

7 months ago

If you have the correct furigana, you could even detect when the TTS model picked the wrong reading and regenerate.

But how do you know the furigana are correct? Unless you start out fully human-annotated text, you need some automated procedure to add furigana, which pushes the problem from "TTS AI picked the wrong reading" to "furigana AI picked the wrong reading."

1 comment

yorwba

mariano54 7 months ago

Yes it pushes the problem, but it's a much easier problem, and models like Gemini flash 2.5 do very well.