Comment by solatic

5 hours ago

Why does the voice need to be sent to the server? Why not perform speech-to-text on-device? Is the p10 phone/laptop not capable of this yet, despite every "dictation" feature I see in every modern OS?

An eventual goal is likely to allow interacting with the LLM directly via audio tokens in input/output skipping tts and stt completely.