Comment by fragmede
3 months ago
> I'm not looking for STT->AI->TTS, I'm looking for truly good voice-to-text experience
Umm, ah, wait no, uhh yes you are. Unless, hang on, you are possessed with greater umm speech capabilities than most, wait nevermind start over. Unless you never make a mistake while talking, you want AI to take out the "three, wait no four" and just leave the output with "four" from what you actually spoke. Depending on your use case.
It’s the TTS layer that is weird. I’m in the same boat — speech out is just a much worse modality than text when possible.
Agreed for a lot of use cases. RCLI supports text-only mode (--no-speak flag or just type in the TUI instead of using push-to-talk). TTS makes sense for hands-free / eyes-free scenarios, but we dont force it.