← Back to context

Comment by vimy

19 hours ago

> We wanted something that would talk with us — realistically, in full conversations — and actually help us improve. So we built it ourselves. The app relies on a custom voice AI pipeline combining STT (speech-to-text), TTS (text-to-speech), LLMs, long term memory, interruptions, turn-taking, etc. Getting speech-to-text to work well for learners was one of the hardest parts — especially with accents, multi-lingual sentences, and noisy environments. We now combine Gemini Flash, Whisper, Scribe, and GPT-4o-transcribe to minimize errors and keep the conversation flowing.

Your prompt can't do this. I know because I've been trying to build something similar and a prompt just isn't enough. You need multiple LLMs and custom code working together to achieve realistic conversations.