Comment by rpozarickij
3 days ago
Grok's updated voice mode is indeed impressive. I wish there was a way to disable automatic turn detection, so that it wouldn't treat silence as an end of the response. I like Claude's approach (you need to tap in order to end the response), but it's not very reliable because sometimes it just abruptly cuts my response without waiting until I tap.
I was pleasantly surprised that Grok even supports (to some degree) Lithuanian in voice mode, which is a quite niche language. Grok's responses themselves are alright, but ChatGPT and Gemini way surpass it in speech recognition and speech synthesis.
> Grok's updated voice mode is indeed impressive. I wish there was a way to disable automatic turn detection, so that it wouldn't treat silence as an end of the response.
You can circumvent that by instructing the model to use "radio etiquette" - only respond after the other part says "over". It will still be compelled to answer when it detects silence, you can't prevent that, but you can instruct it to only reply with a short "mhm" until you say "over". Feels very natural.
Like most models I've used with this old hack, it will immediately start role-playing and also end its own responses with "over".
This is such a cool idea. I wonder whether it's possible to define a custom Personality in Grok's voice settings that would do this. Unfortunately I'm not able to create a new Personality in Grok's settings to test this right now on my phone (iPhone 15 Pro Max), because the Personality creation screen closes immediately after opening it. Might be a bug or some other issue.
this is such a great, obvious(?) idea, I've always hated feeling "rushed" whenever I talk to a voice agent and doesn't give me enough time to think.
yes their voice mode is pretty good also works with Polish (much better than few months ago). I wish they had also option 'push to talk' (walkie talkie style with big button) similar like perplexity allow such mode or 'automatic'.
Also would be great if they added voice mode in browser (again like perplexity).
> Also would be great if they added voice mode in browser
There seems to be a voice mode button in the prompt input box at ~29:00 of the Grok 4 announcement video. So perhaps they're working on this, but it's hidden from the public.
I find for auto turn detection, models work better if you put in the system prompt "if it seems the user hasnt completed their thought yet, output silence". This hack works around their compulsive need to output something.
Even better if you can just use umm’s like in a human conversation.
I feel like they should train a dumb model that does nothing but recognize when someone has finished talking, and use that to determine when to stop listening and start responding. Maybe it could even run on the phone?
Lithuanian sounds so weird on ChatGPT tho, almost like my kids speak - with sort of english accent. Regardless it gives my parents superpower (when it actually works hehe).
> you need to tap in order to end the response
I hope that can be turned off while driving...