Comment by jorl17

2 days ago

While I think this is true

> If you use GenAI on things that you couldn't approach alone, it's an incredible tool.

I think this isn't true in all cases

> If you use it on stuff that you're pretty good at, it's not a gamechanger (and if you're an expert, it's a minor boost at best).

I think even then there's a divide.

I mostly work greenfield projects (and love it!). For these, AI has been a literal game changer. Our projects are built faster, with one or two orders of magnitude more automated tests, and all quality metrics are up.

Meanwhile, nearly all of my friends complain that AI doesn't help them. But they mostly work in very large existing codebases.

Still, even in large projects I think AI (the expensive variant) has been a complete gamechanger for me. Sure, I spend a lot on tokens, but I just feel happier and enjoy what I do more. The singalong people say about "thinking at a higher abstraction level" is what I feel. I really am thinking about architecture and larger patterns, instead of the boring nitty-gritty (which wasn't boring at all when I was a kid learning to code!...)

I think a key factor in all of this, to me, has been dictation. Most of the time, I don't write -- I use voice-to-text. I don't even read what comes out of it -- the LLMs get it (it is mostly unintelligible to anyone else) .

This means when I'm planning a big feature, I give a gigantic brain dump to the LLM in perfect stream of consciousness way, going through ideas, pros and cons, edge cases, what exists, what doesn't exist, where I'm sure of something, where I'm not sure and want the LLM to browse the state-of-the-art. Sometimes I spend 20 minutes just talking to the microphone before I send the first prompt. When I pair that with Opus, I find that I am able to build much faster and to go through alternative designs much more frequently as well.

I keep trying to tell all my friends: use voice to text and braindump to the computer. But they refuse... I couldn't imagine having to type everything nowadays. Even though I'm a fast typer, it's still much slower than the speed of my thought, which, granted, is still faster than the speed of my voice.

In effect, I filter much less, but I've come to think that's positive for the good LLMs: I throw all the edge cases and what ifs I'm thinking about -- all those years of experience dealing with similar systems.

If I wanted to go back to work in-office, that would be my major problem: I need to be able to talk with my computer all the time, loudly, and pacing through my room.

Yay for dictation! It's so nice to just think aloud and then have an easily editable record of your thoughts, even when you aren't feeding the outputs to LLMs.

How do you use voice-to-text? You mean, in the browser? I am only familiar with Claude Code, which I have installed on remote server, and there obviously, voice-to-text does not work. I have to type, which is tiring.

  • I’ve installed Hex on os x. You just hold down a hot key to talk and it writes into whatever text entry widget is focussed.

  • There are many tools for this, and I use the one that I tried first, so there are probably better-suited alternatives out there.

    I run MacWhisper, and I paired it with BetterTouchTool so it triggers on any input when I double tap the fn/globe icon.

    Obviously all of my transcriptions through it are entirely local. I usually use the Large V3 Turbo model, though in the beginning I used Parakeet v3, which was slightly faster but produced more mistakes (and kept a lot of filler words -- 'ahhm', 'hummm').

    However, if I'm interacting with the Claude or ChatGPT/Codex apps, I often use their voice recognition instead, because it tends to be more accurate, especially with punctuation, albeit significantly slower. OpenAI's is noticeably better than Anthropic but I feel like that gap has closed a bit recently (might be all in my head, though).

    Like I said I don't really care about mistakes in the transcription. If you try to read it, it feels like a fever dream, but the LLMs get it.

    If I say "taken" it may have "take and" If I say "all the while calling the method" it might have "although a while. while. call in the met of". This is a rather extreme example but I've seen them happen. The repetition of words happens because I'm talking with "humns and ahs" and do repeat words or just the ends of words. It's very rare for the models, especially Opus, to have any issue with this transcription. When they do, they tend to signal to me they didn't get it, or I catch them in the act. But, like I said, it really is very very rare.

    As an example, I've got quite a significant feature to work on, which would have probably taken me weeks to design and implement, and I've used this exact method today to ink out the plan:

    - I have spent the last couple of days researching the feature in my off-time and just "thinking about it in the background" (think: I fall asleep thinking of it -- a habit I've always had)

    - I spent ~25 minutes brainstorming out loud. The transcript ended with ~17.000 characters and ~3.000 words.

    - I sent that transcript, in cursor, to Opus 4.6-High with instructions on how to iterate on it and how I want to work while planning

    - I then spent about 1.5 hours with it iterating and building the actual plan (and supporting technical decision document, which points at the FULL transcript of the whole interaction). Many of my original ideas made it to the final plan, others got scrapped or simplified, and others still got added. It contains a mixture of my ideas, Opus' ideas and our push-back on "each other".

    - Now I have a multi-step plan, with at least 8 distinct stages to implement this massive feature which I know for a fact would have taken me weeks to implement, and I expect to implement it in at most 3 days, but very likely it will be a day and a half.

    Final context (with regards to your Claude Code question): My main development environment is Cursor, though for personal projects I also use Codex and Claude code. For the initial "researching of the feature in my off-time" I often have interactions with ChatGPT and Claude where they have no access to the codebase, and I have them go find out what the state of the art on specific topics is. All of these interactions also involve me using my voice to talk to them (though nowadays I don't typically use their voice mode, I just let them reply in text). Then I brood over that.

This is exactly my workflow and it’s just incredible. I use aqua and wispr flow depending on which one seems to be returning the best results that day.