Comment by yorwba
8 hours ago
If you have the correct furigana, you could even detect when the TTS model picked the wrong reading and regenerate.
But how do you know the furigana are correct? Unless you start out fully human-annotated text, you need some automated procedure to add furigana, which pushes the problem from "TTS AI picked the wrong reading" to "furigana AI picked the wrong reading."
Yes it pushes the problem, but it's a much easier problem, and models like Gemini flash 2.5 do very well.