Comment by mjr00
1 month ago
It's the Jarvis Effect.
For years we had people trying to make voice agents, like Iron Man's Jarvis, a thing. You had people super bought into the idea that if you could talk to your computer and say "Jarvis, book me a flight from New York to Hawaii" and it would just do it just like the movies, that was the future, that was sci-fi, it was awesome.
But it turns out that voice sucks as a user interface. The only time people use voice controls is when they can't use other controls, i.e. while driving. Nobody is voluntarily booking a flight with their Alexa. There's a reason every society on the planet shifted from primarily phone calls to texting once the technology was available!
It's similar with vibe coding. People like Yegge are extremely bought into the idea of being a hyperpowered coder, sitting in a dimly lit basement in front of 8 computer screens, commanding an army of agents with English, sipping coffee between barking out orders. "Agent 1, refactor that method to be more efficient. Agent 5, tighten up the graphics on level 3!"
Whether or not it's effective or better than regular software development is secondary, if it's a concern at all. The purpose is the process. It's the future. It's sci-fi. It's awesome.
AI is an incredible tool and we're still discovering the right way to use it, but boy, "Gas Town" is not it.
This is confusing. Voice is not a UI, it's an input device. When I call my bank and have to input some numbers into the automated system, I prefer to say them than to type them. The phone menu system is the UI, fingers or voice are two different input modes for the same UI.
The problem with alexa booking tickets is not the use of my voice but that there are a lot of decisions (comparison shopping, seat selection etc) to be made. Alexa can't read my mind to make the trade-offs I would make, although it could ask me 10 zillion questions. The difference between voice/ears and fingers/eyes is the bandwidth of information transfer, but also the availability of the tools. Hands and eyes may be busy as in your car example, but they are also busy if I'm carrying a toddler around the house or can't be bothered to reach into my pocket or am already using my phone for something else (game, video etc). So voice is a good option for many tasks. And LLMs/agents do have the potential to make more tasks (simple ones, not booking tickets) accessible to voice since "AI as UI" is where it holds the most potential IMHO. And that's great because we need all the help we can get to avoid taking our phones out of our pockets and getting sucked into random tangents like HN comment threads just bc we wanted to check the weather
"Agent 1, refactor that method to be more efficient. Agent 5, tighten up the graphics on level 3!"
I'm not sure its even that, his description of his role in this is:
"You are a Product Manager, and Gas Town is an Idea Compiler. You just make up features, design them, file the implementation plans, and then sling the work around to your polecats and crew. Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it."
And he says he isn't reviewing the code, he lets agents review each others code from look of it. I am interested to see the specs/feature definitions he's giving them, that seems to be one interesting part of his flow.
Yeah maybe the refactoring was a bad example because it implies looking at the code. It's more like "Agent 1, change the color of this widget. Agent 9, add a red dot whenever there's a new message. Agent 67, send out a marketing email blast advertising our new feature."
Assuming both agents are using the same model, what could the reviewer agent add of value to the agent writing the code? It feels like "thinking mode" but with extra steps, and more chance of getting them stuck in a loop trying to overcorrect some small inane detail.
He does cover this later:
"I implemented a formula for Jeffrey Emanuel’s “Rule of Five”, which is the observation that if you make an LLM review something five times, with different focus areas each time though, it generates superior outcomes and artifacts. So you can take any workflow, cook it with the Rule of Five, and it will make each step get reviewed 4 times (the implementation counts as the first review)."
And I guess more generally, there is a level of non-determinism in there anyway.
> Nobody is voluntarily booking a flight with their Alexa.
Rich people use voice because they have disposable income and they don't care if a flight is $800 or $4,000. They are likely buying business/first class anyways.
Tony Stark certainly doesn't care. Elon Musk certainly uses voice to talk to his management team to book his flights.
The average person doesn't have the privilege of using voice because it doesn't have enough fuck-you-money to not care for prices.
As someone who's friends with executive assistants: rich people use executive assistants (humans) because they are busy and/or value their time more than money and don't want to bother with the details. None of them are using voice assistants.
> Tony Stark certainly doesn't care. Elon Musk certainly uses voice to talk to his management team to book his flights.
Delegating to a human isn't the same as using a voice assistant, this should be obvious, unless you believe that managers are doing all the real work and every IC is a brainless minion. Maybe far in the future when there's AGI, but certainly not today.
> The average person doesn't have the privilege of using voice because it doesn't have enough fuck-you-money to not care for prices.
You can order crap off Amazon for the same price as you would through the website with your Alexa right now, but Amazon themselves have admitted approximately 0% of people actually do this which is why the entire division ended up a minor disaster. It's just a shitty interface in the same way that booking a flight through voice is a shitty interface.
> None of them are using voice assistants.
Rich people will literally just talk to their executive assistants and just ask what they want. They may use phone calls, voice mails, emails, and text. But you'd be crazy to argue that they never use just voice with their IRL assistants.
Your point is that voice is a terrible interface to get something done. My point is that some people have the privilege to use voice to get something done.
Some companies are just trying to remove the human who is taking the voice command and replacing with AI.
1 reply →