Comment by ianbicking
1 day ago
I've been thinking and playing slightly with this concept myself. A few thoughts:
1. Using a standard transcription service is pretty tricky because it's going to correct the user's speech. Or make it incorrect! Standard transcription is predicated on the speaker saying things correctly.
2. I've tried sending the audio directly to OpenAI to address this issue. I can't say if it works or not. It's very hard to test or understand a system without a transcript as a source of truth!
3. I'd like to learn a new language as a beginner, and all of these AI systems work poorly for this. It's great to immerse the learner in the language, but if you know NOTHING then it's not that helpful.
4. Language learning needs to be MUCH more multimodal than a standard chat. Especially as a beginner.
5. The AI should be generating translations and explanations alongside its responses. I'd like to be able to inspect everything the AI says (in the language I'm learning) to understand it.
6. Emoji would be another easy way to annotate the text.
7. I think giving the user/AI a subject to talk about would be helpful. Again, a subject that is not language-based would be great, like an image or something.
8. As a very new learner I would like an experience where I respond in my native language and then I'm told how to translate this to the language I'm learning. This should include a pronunciation guide. Then I should repeat the phrase I'm given.
9. I should still be able to ask questions in my native language and probably get a response in my native language. But with some prompting the AI should be able to distinguish these two cases.
10. For low latency it's nice if you produce the spoken text quickly, but you still have the opportunity to get the LLM to produce _more_ material immediately after. This is where things like translations can be produced.
11. You probably don't have timestamps on your TTS, but if you did and could highlight words as they were spoken that would be _great_. Probably worth choosing a TTS provider with that in mind.
No comments yet
Contribute on Hacker News ↗