Comment by bogtog
2 days ago
Using voice transcription is nice for fully expressing what you want, so the model doesn't need to make guesses. I'm often voicing 500-word prompts. If you talk in a winding way that looks awkward when in text, that's fine. The model will almost certainly be able to tell what you mean. Using voice-to-text is my biggest suggestion for people who want to use AI for programming
(I'm not a particularly slow typer. I can go 70-90 WPM on a typing test. However, this speed drops quickly when I need to also think about what I'm saying. Typing that fast is also kinda tiring, whereas talking/thinking at 100-120 WPM feels comfortable. In general, I think just this lowered friction makes me much more willing to fully describe what I want)
You can also ask it, "do you have any questions?" I find that saying "if you have any questions, ask me, otherwise go ahead and build this" rarely produces questions for me. However, if I say "Make a plan and ask me any questions you may have" then it usually has a few questions
I've also found a lot of success when I tell Claude Code to emulate on some specific piece of code I've previously written, either within the same project or something I've pasted in
> I'm not a particularly slow typer. I can go 70-90 WPM on a typing test. However, this speed drops quickly when I need to also think about what I'm saying. Typing that fast is also kinda tiring, whereas talking/thinking at 100-120 WPM feels comfortable.
This doesn't feel relatable at all to me. If my writing speed is bottlenecked by thinking about what I'm writing, and my talking speed is significantly faster, that just means I've removed the bottleneck by not thinking about what I'm saying.
It's often better to segregate creative and inhibitive systems even if you need the inhibitive systems to produce a finished work. There's a (probably apocryphal) conversation between George RR Martin and Stephen King that goes something like:
GRRM: How do you write so many books?... Don't you ever spend hours staring at the page, agonizing over which of two words to use, and asking 'am I actually any good at this?'
SK: Of course! But not when I'm writing.
That's fair. I sometimes find myself pausing or just talking in circles as I'm deciding what I want. I think when I'm speaking, I feel freer to use less precise/formal descriptions, but the model can still correctly interpret the technical meaning
In either case, different strokes for different folks, and what ultimately matters is whether you get good results. I think the upside is high, so I broadly suggest people try it out
Alternatively: some people are just better at / more comfortable thinking in auditory mode than visual mode & vice versa.
In principle I don't see why they should have different amounts of thought. That'd be bounded by how much time it takes to produce the message, I think. Typing permits backtracking via editing, but speaking permits 'semantic backtracking' which isn't equivalent but definitely can do similar things. Language is powerful.
And importantly, to backtrack in visual media I tend to need to re-saccade through the text with physical eye motions, whereas with audio my brain just has an internal buffer I know at the speed of thought.
Typed messages might have higher _density_ of thought per token, though how valuable is that really, in LLM contexts? There are diminishing returns on how perfect you can get a prompt.
Also, audio permits a higher bandwidth mode: one can scan and speak at the same time.
It's kind of the point. If you start writing it, you'll start correcting it and moving things around and adding context and fiddling and more and more.
And your 5 minute prompt just turned I to 1/2 hour of typing
With voice you get on with it, and then start iterating, getting Claude to plan with you.
Not been impressed with agentic coding myself so far, but I did notice that using voice works a lot better imo, keeping me focused on getting on with letting the agent do the work.
I've also found it good for stopping me doing the same thing in slack messages. I ramble my general essay to ChatGPT/Claude, get them to summarize rewrite a few lines in my own voice. Stops me spending an hour crafting a slack message and tends to soften it.
I prefer writing myself, but I could see the appeal of producing a first draft of a prompt by dumping a verbal stream of consciousness into ChatGPT. That might actually be kind of fun to try while going on a walk or something.
I don’t feel restricted by my typing speed, speaking is just so much easier and convenient. The vast majority of my ChatGPT usage is on my phone and that makes s2t a no brainer.
100% this, I built laboratory.love almost entirely with my voice and (now-outdated) Claude models
My go-to prompt finisher, which I have mapped to a hotkey due to frequent use, is "Before writing any code, first analyze the problem and requirements and identify any ambiguities, contradictions, or issues. Ask me to clarify any questions you have, and then we'll proceed to writing the code"
> if you talk in a winding way …
My regular workflow is to talk (I use VoiceInk for transcription) and then say “tell me what you understood” — this puts your words into a well structured format, and you can also make sure the cli-agent got it, and expressing it explicitly likely also helps it stay on track.
surprised ai companies are not making this workflow possible instead of leaving it upto users to figure out how to get voice text into prompt.
> surprised ai companies are not making this workflow possible instead of leaving it upto users to figure out how to get voice text into prompt.
Claude on macOS and iOS have native voice to text transcription. Haven't tried it but since you can access Claude Code from the apps now, I wonder if you use the Claude app's transcription for input into Claude Code.
> Claude on macOS and iOS have native voice to text transcription
Yeah, Claude/ChatGPT/Gemini all offer this, although Gemini's is basically unusable because it will immediately send the message if you stop talking for a few seconds
I imagine you totally could use the app transcript and paste it in, but keeping the friction to an absolute minimum (e.g., just needing to press one hotkey) feels nice
All the mobile apps make this very easy.
That's a fun idea. How do you get the transcript into Claude Code (or whatever you use)? What transcription service do you use?
I use the Raycast + Whisper Dictation. I don't think there is anything novel about it, but it integrates nicely into my workflow.
My main gripe is when the recording window loses focus, I haven't found a way to bring it back and continue the recorded session. So occasionally I have to start from scratch, which is particularly annoying if it happens during a long-winded brain dump.
I'm not the person you're replying to, but I use Whispering connected to the whisper-large-v3-turbo model on Groq.
It's incredibly cheap and works reliably for me.
I have got it to paste my voice transcriptions into Chrome (Gemini, Claude, ChatGPT) as well as Cursor.
https://github.com/EpicenterHQ/epicenter
I use Handy with Claude code. Nice to just have a key combo to transcribe into whatever has focus.
https://github.com/cjpais/Handy
Love handy. I use it too when dealing with LLMs. The other day I asked chatgpt to generate interview questions based on job description and then I answered using handy. So cool!
I use Spokenly with local Parakeet 0.6B v3 model + Cerebras gpt-oss-120b for post-processing (cleaning up transcription errors and fixing technical mondegreens, e.g., `no JS` → `Node.js`). Almost imperceptible transcription and processing delay. Trigger transcription with right ⌥ key.
According to Google this is the first time the phrase "technical mondegreens" was ever used. I really like it.
I built my own open-source tool to do exactly this so that I can run something like `claude $(hns)` in my terminal and then I can start speaking, and after I'm done, claude receives the transcript and start working. See this workflow here: https://hns-cli.dev/docs/drive-coding-agents/
your OS might have a built in dictation thing. Google for that and try it before online services.
There are a few apps nowadays for voice transcription. I've used Wispr Flow and Superwhisper, and both seem good. You can map some hotkey (e.g., ctrl + windows) to start recording, then when you press it again to stop, it'll get pasted into whatever text box you have open
Superwhisper offers some AI post-processing of the text (e.g., making nice bullets or grammar), but this doesn't seem necessary and just makes things a bit slower
+1 for Superwhisper. It has an offline model for transcription. And it transcribes with very high accuracy for me and great speed.
I do the same. On Mac I use macwhisper. The transcription does not have to be correct. Lots of times it writes the wrong word when talking about technical stuff but Claude understands which word I mean from context
made this tool to press double control to start and another ctrl to stop which copies to the cliboard
https://github.com/elv1n/para-speak/
So cool man! Had to add couple fixes to be able to use it on mac. Love it!
I use VoiceInk (needed some patches to get it to compile but Claude figured it out) and the Parakeet V3 model. It’s really good!
Thanks for the advice! Could you please share how did you enable voice transcription for your setup and what it actually is?
I use https://github.com/braden-w/whispering with an OpenAI api key.
I use a keyboard shortcut to start and stop recording and it will put the transcription into the clipboard so I can paste into any app.
It's a huge productivity boost - OP is correct about not overthinking trying to be that coherent - the models are very good at knowing what you mean (Opus 4.5 with Claude Code in my case)
I just installed this app and it is very nice. The UX is very clean and whatever I say it transcribes it correctly. In fact I'm transcribing this comment with this app just now.
I am using Whisper Medium. The only problem I see is that at the end of the message it sometimes puts a bye or a thank you which is kind of annoying.
I am all ready to believe that with LLMs it's not worth it trying to be too coherent: I did successfully use LLMs to make sense of what incoherent-sounding people say. (in text)
Aquavoice, YC company, really good. Got it after doing a bit of research on here, there's something for Mac that's supposed to be good too.
If you want local transcription, locally running models aren't quite good enough yet.
They use right-ctrl as their trigger. I've set mine to double tap and then I can talk with long pauses/thinking and it just keeps listening till I tap to finish.
I'm using Wispr flow, but I've also tried Superwhisper. Both are fine. I have a convenient hotkey to start/end recording with one hand. Having it just need one hand is nice. I'm using this with the Claude Code vscode extension in Cursor. If you go down this route, the Claude Code instance should be moved into a separate window outside your main editor or else it'll flicker a lot
another option is MacWhisper if someone is on macOS and doesn't want to pay for subscription (just one time payment) - pretty much all of those apps these days use paraspeech from NVIDIA which is the fastest and the best open source model that can run on edge devices.
Also haven't tried but on latest MacOS 26 apple updated their STT models so their build in voice dictation maybe is good enough.
For me, on Mac, VoiceInk has been top notch. Got tired of superwhispr
Spokenly on macOS with Soniox model.
voice transcription is silly when someone is listening you talking to something that isn't exactly human, imagine explaining you were talking to AI. When it's more than one sentence I use voice too.
Speech also uses a different part of the brain, and maybe less finger coordination.
It's an AI. You might do better by phrasing it, 'Make a plan, and have questions'. There's nobody there, but if it's specifically directed to 'have questions' you might find they are good questions! Why are you asking, if you figure it'd be better to get questions? Just say to have questions, and it will.
It's like a reasoning model. Don't ask, prompt 'and here is where you come up with apropos questions' and you shall have them, possibly even in a useful way.