Comment by TuringNYC

1 day ago

Also I noticed your app doesn't work without a network connection, so i'm assuming you're doing all the TTS and STT server-side. Curious how practical that is w/r/t latency? Any plans to doing it all on-phone?

(probably a more fringe request, but i'm asking because I do all my language learning on the commuter trains w/o a good connection.)

1 comment

TuringNYC

mariano54 1 day ago

Exactly, it's all server side. There are no plans for this. The main issue I see with doing it in device is the LLM piece. Even with some large models like llama 4 maverick, the tutor just struggles to properly teach and understand the student, it's not viable IMO.

Intelligence is super key here, especially as the context size gets larger (due to memory) and intelligence degrades.

Another major issue is TTS voice quality, but this seems to be improving a lot for small local models.

EDIT: You're right, latency is also a big deal. You need to get each piece under a second, and the LLM part would be especially slow on mobile devices.