Comment by danfritz
2 days ago
Every time I see a post like this on HN I try again and every time I come to the same conclusion. I have never see one agent managing to pull something off that I could instantly ship. It still ends up being very junior code.
I just tried again and ask Opus to add custom video controls around ReactPlayer. I started in Plan mode which looked overal good (used our styling libs, existing components, icons and so on).
I let it execute the plan and behold I have controls on the video, so far so good. I then look at the code and I see multiple issues: Over usage of useEffect for trivial things, storing state in useState which should be computed at run time, failing to correctly display the time / duration of the video and so on...
I ask follow up question like: Hide the controls after 2 seconds and it starts introducing more useEffects and states which all are not needed (granted you need one).
Cherry on the cake, I asked to place the slider at the bottom and the other controls above it, it placed the slider on the top...
So I suck at prompting and will start looking for a gardening job I guess...
These posts are never, never made by someone who is responsible for shipping production code in a large, heavily used application. It's always someone at a director+ level who stopped production coding years ago, if they ever did, and is tired of their engineers trying to explain why something will take more than an hour.
It is also often low-proficiency developers with their minds blown over how quickly they can build something using frameworks / languages they never wanted to learn or understand.
Though even that group probably has some overlap with yours.
Back in the day when you found a solution to your problem on Stackoverflow, you typically had to make some minor changes and perhaps engage in some critical thinking to integrate it into your code base. It was still worth looking for those answers, though, because it was much easier to complete the fix starting from something 90% working than 0%.
The first few times in your career you found answers that solved your problem but needed non-trivial changes to apply it to your code, you might remember that it was a real struggle to complete the fix even starting from 90%. Maybe you thought that ultimately, that stackoverflow fix really was more trouble than it was worth. And then the next few times you went looking for answers on stackoverflow you were better at determining what answers were relevant to your problem/worth using, and better at going from 90% to 100% by applying their answers.
Still, nobody really uses stackoverflow anymore: https://blog.pragmaticengineer.com/stack-overflow-is-almost-...
You and most of the rest of us are all actively learning how to use their replacement
> it was much easier to complete the fix starting from something 90% working than 0%.
As an expert now though, it is genuinely easier and faster to complete the work starting from 0 than to modify something junky. The realplayer example above I could do much faster, correctly, than I could figure out what the AI code was trying to do with all the effects and refactor it correctly. This is why I don't use AI for programming.
And for the cases where I'm not skilled, I would prefer to just gain skill, even though it takes longer than using the AI.
Anecdotally I think you're right that the more skilled you are at something, the less utility there is for something that quickly but incompletely takes you from 0 to 90%
But I would generally be skeptical of anybody who claims that all their work is better off starting from 0, the same way I'd be skeptical of someone who claims to not use or need to make google searches about docs/terms/issues as they work.
I'll give you an example of something I understand decently well but get a lot of use out of AI for: bash scripts and unit testing. These are not my core work but they are a large chunk of my work. Without LLMs I would just not write a lot of bash scripts because I found myself constantly looking things up and spending more time than expected getting the script to work across environments / ironing out bugs - I would only write absolutely essential scripts, and generally they'd not be polished enough to check in and share with the team, and just live on my computer in some random location. Now with LLMs I can essentially script in english and get very good bash scripts, so I write a lot more of them and it's easier for me to get them into an acceptable state worth sharing with my team.
Similarly, I really like Golang table tests but hate writing all the cases out and dealing with all the symbols/formatting. Now I can just describe all the different permutations I want and get something that I can lightly edit into being good enough.
I've also found that with domains I am knowledgable enough about, that can translate into being better at going from ~70% to 95% with AI too. In those cases I am not necessarily using AI the same way as someone trying to go from 0->90%: usually they're describing the outcome/goals/features they want relatively informally without knowledge of the known-unknowns and gotchas involved in implementing that. With more knowledge you can prompt LLMs with more implementation/design details and requirements, and course correct away from bad approaches much faster than someone who doesn't know the shape of what they're trying to do. That still comes in handy a lot of the time.
Think about how much time you can save by feeding an API spec/docs into an LLM, telling it create a Go struct for JSON (de)serialization of some monstrous interface like https://docs.cloud.google.com/compute/docs/reference/rest/v1...? Or how much easier it is to upgrade across breaking versions of a language/library when you can just bump the version, note all the places where the old code broke, and have an LLM with an upgrade guide/changelog do all the drudgery of fixing each of the 200 callsites you need to migrate to the next version.
The difference is you’re generally retooling for your purpose rather than scouring for multiple, easily avoidable screw ups that if overlooked will cause massive headaches later on.
I've spent quite a bit of time with Codex recently and come to the conclusion that you can't simply say "Let's add custom video controls around ReactPlayer." You need to follow up with a set of strict requirements to set expectations, guard rails, and what the final product should do (and not do). Even then it may have a few issues, but continuing to prompt with clearly stated problems that don't meet the requirements (or you forgot to include) usually clears it up.
Code that would have taken me a week to write is done in about 10 minutes. It's likely on average better than what I could personally write as a novice-mid level programmer.
>You need to follow up with a set of strict requirements to set expectations, guard rails, and what the final product should do (and not do).
that usually very hard to do part, and is was possible to spent few days on something like that in real word before LLMs. But with LLMs is worse because is not enough to have those requirements, some of those won't work for random reasons and is no any 'rules' that can grantee results. It always like 'try that' and 'probably this will work'.
Just recently I was struggled with same prompt produced different result between API calls before I realized that just usage of few more '\"' and few spaces in prompt leaded model to completely different route of logic which produced opposite answers.
By the time I have figured out all those quirks and guardrails I could have done it myself in 45min tops.
This is very true. But each iteration of learning quirks and installing guardrails carries value forward to later sessions. These rough edges get smoother with use, is my point.
It sounds like it takes you at least 10 minutes to just write the prompt with all the details you mentioned. Especially if you need to continue and prompt again (and again?).
I mean, I typically do a lot more thinking than 10 minutes.
I’m writing some (for me) seriously advanced software that would have taken me months to write, in weeks, using Claude and ChatGPT.
It’s even unlikely I would be able to pull it off myself after a long days work.
The LLM doesn’t replace. It works in parallel.
2 replies →
Not the OP but, easily. My tasks are usually taking at least that, but up to hours of brainstorming and planning, sometimes I’ll do this over days in between other tasks just so I can think about all and pros and cons. Of course this has always been the way, but now I have an ongoing Claude session which I can come back to at any point, which is holding the context along with my brain. It’s much easier to keep the thread of what I’m working on across multiple tasks.
I used to run into this quite a bit until I added an explicit instruction in CLAUDE.md to the effect of:
> Be thoughtful when using `useEffect`. Read docs at https://react.dev/learn/you-might-not-need-an-effect to understand if you really need an effect
Have you tried Roo Code in "Orchestrator" mode? I find it generally "chews" the tasks I give it to then spoon feed into sub-tasks in "Code" (or others) mode, leaving less room to stray from very focused "bite-sized" changes.
I do need to steer it sometimes, but since it doesn't change a lot at a time, I can usually guide the agent and stop the disaster before it spreads.
A big caveat is I haven't tried heavy front-end stuff with it, more django stuff, and I'm pretty happy with the output.
I have a vanilla JS project. I find that very small llms are able to work on it with no issue. (Including complete rewrites.) But I asked even large LLMs to port it to React and they all consistently fail. Basic functionality broken, rapid memory leaks.
So I just stuck with vanilla JS.
n = 1 but React might not be a great thing to test this stuff with. For the man and the machine! I tried and failed to learn React properly like 8 times but I've shipped multiple full stack things in like 5 other languages no problem.
usually for me, after a good plan is 90% solid working code. the problem do arise when you ask it to change the colors it choose of light grey text over a white background. this thing still can't see and it's a huge drawback for those who got used to just prompting away their problems
I always assume the person either didn't use coding agents in a while or its their first time. don't get me wrong, i love claude code, but my students are still better at getting stuff done that i can just approve and not micromanage. thats what i think everyone is missing from their commentary. you have to micromanage a coding agent. you don't have to micromanage a good student. when you dont need to micromanage anymore at all, that's when the floor falls out and everyone has a team of agents doing whatever they want to make them all billionaires or whatever it is AI is promising to do those days.
Around a Uni I think a lot about what students are good at and what they aren't good at.
I wouldn't even think about hiring a student to do marketing work. They just don't understand how hard it is to break through people's indifference and lack the hustle. I want 10-100x more than I get out of them.
Photos in The Cornell Daily Sun make me depressed. Students take a step out the door, take a snap, then upload it. I think the campus is breathtakingly beautiful and students just don't do the work to take good photos that show it.
In coding it is across the map. Even when I am happy with the results they still do the first 80% that takes another 80% to put in front of customers. I can be really proud of how it turned out in the end despite them missing the point of the design document they were handed.
I was in a game design hackathon where most of the winners were adults or teams with an adult on them. My team won player's choice. I'll take credit for my startup veteran talent of fearlessly demonstrating broken software on stage and making it look great and doing project management with that in mind. One student was solid on C# and making platformers in Unity. I was the backup programmer who worked like a junior other than driving them crazy slowing them down with relentlessly practical project management. The other student made art that fit our game.
We were at each other's throats at the end and shocked that we won. I think I understood the value everybody brought but I'm not sure my teammates did.
I find anecdotes like yours bewildering, because I've been using Opus with Vue.js and it crushes everything I throw at it. The amount of corrections I need to make tend to be minimal, and mostly cosmetic.
The tasks I give it are not trivial either. Just yesterday I had it create a full-blown WYSIWYG editor for authoring the content we serve through our app. This is something that would have taken me two weeks, give or take. Opus looked at the content definitions on the server, queried the database for examples, then started writing code and finished it in ~15 minutes, and after another 15-20 minutes of further prompting for refinement, it was ready to ship.
Created a WYSIWYG editor or copied it off the internet like your average junior would, bugs included?
If that editor is very complicated (as they usually are) it makes sense to just opt for a library. If it's simple then AI is not required and would only reduce familiarity with how it works. The third option is what you did and I feel like it's the option with the lowest probability of ending up with a quality solution.
There is contenteditable and EditContext hese days, it's not that hard to make a simple WYSIWYG editor. An LLM could figure out how to operationalize these things quicker than I could.
1 reply →
Yep. It sucks. People are delusional. Let's ignore LLMs and carry on...
On a more serious note:
1) Split tasks into smaller tasks just like a human would do
Would you bash your keyboard for an hour, adding all video controls at once before even testing if anything works at all? Ofc not. You would start by adding a slider and test it until you are satisfied. Then move to next video control. An so on. LLMs are the same. Sometimes they can one-shot many related changes in a single prompt but the common reality is what you experienced: it works sometimes but the code is suboptimal.
2) Document desireable and undesireable coding patterns in AGENTS.md (or CLAUDE.md)
If you found over usage of useEffect, document it on AGENTS.md so next time the LLM knows your preference.
I have been using LLMs since Sonet 3.5 for large enterprise projects (1kk+ lines of code, 1k+ database tables). I just don't ask it to "draw the rest of owl" as the saying goes.
So? Getting a months' worth of junior level code in an hour is still unbelievable.
Whats the improvement here? I spend more time fixing it then doing it myself anyways. And I have less confidence in the code Opus generates
i’ve become convinced that the devs that talk about having to fix the code are the same ones that would make incredibly poor managers. when you manage a team you need to be focused on the effect of the code not the small details.
this sort of developer in a pair programming exercise would find themselves flustered at how a junior approached problem solving and just fix it themselves. i strongly suspect the loss of a feeling of control is at play here.
What are you fixing?
2 replies →
[dead]