Comment by aizk
1 day ago
Nice! I'm curious if your software can pick up really subtle details - like for instance, pitch accent in Japanese (which is basically NEVER covered in a beginner level course) but is useful to just be aware of as a language learner.
Thanks! No, it cannot do this yet, as it's using a pipeline of voice to text to voice. I think models are heading in that direction, and voice to voice models are getting better.
For pitch accent, shadowing is a great way to improve. You can pause and repeat the tutors messages for example, or read out the word when doing flashcard reviews (copying the flashcard audio).