Comment by konovalov-nk
1 day ago
In general there aren't really models that can understand nuances of your speech yet. Gemini 2.5 voice mode changed that only recently and I think it can understand emotions but I'm not sure if it can detect things like accent and mispronouncing. The problem is data, we need a large corpus of data labeled how exactly the audio sample is mispronouncing the word, so the model can cluster those. Maybe self-learning techniques without human feedback can do it somehow. Other than that I'm not seeing how this is even possible to train such model with what's currently available.
No comments yet
Contribute on Hacker News ↗