← Back to context

Comment by wolvoleo

4 days ago

It sounds like you mean STT not TTS there?

You're right, in my rage I typod, its really frustrating, even friends will text me and their text makes no sense, and 2 minutes later "STUPID VOICE TO TEXT" I have a few friends who drive trucks, so they need to be able to use their voice to communicate.

  • Better speech transcription is cool, but that feels kinda contrived. Phone calls exist, so do voice messages sent via texting apps, and professional drivers can also just wait a bit to send messages if they really must be text; they're on the job, but if it's really that urgent they can pull over.

  • I have to say that OpenAI's Whisper model is excellent. If you could leverage that somehow I think it would really improve. I run it locally myself on an old PC with 3060 card. This way I can run whisper large which is still speedy on a GPU especially with faster-whisper. Added bonus is the language autodetection which is great because I speak 3 languages regularly.

    I think there's even better models now but Whisper still works fine for me. And there's a big ecosystem around it.

    • I wonder what the wattage difference is between the iPhone STT and Whisper? How many seconds would the iPhone battery last?