Comment by AstroBen
1 day ago
It's an impossible thing to disprove. Anything you say can be countered by their "secret workflow" they've figured out. If you're not seeing a huge speedup well you're just using it wrong!
The burden of proof is 100% on anyone claiming the productivity gains
I go to meetups and enjoy myself so much; 80% of people are showing how to install 800000000 MCPs on their 92gb macbook pros, new RAG memory, n8n agent flows, super special prompting techniques, secret sauces, killer .md files, special vscode setups and after that they still are not productive vs just vanilla claude code in a git repos. You get people saying 'look I only have to ask xyz... and it does it! magic' ; then you just type in vanilla CC 'do xyz' and it does exactly the same thing, often faster.
This was always the case. People obsessing over keyboards, window managers, emacs setups... always optimizing around the edges of the problem, but this is all taking an incredible amount of their time versus working on real problems.
Yes, the thing they realize much later in life is that perhaps they enjoyed the act of gardening (curating your tools, workflows, etc) much more than farming (being downright focused and productive on the task at hand).
Sadly gardening doesn’t pay the bills!
5 replies →
Same thing happens in music production. If only I had this guitar, or that synth, or these plugins…
1 reply →
It's the four hobbies all over again: https://brooker.co.za/blog/2023/04/20/hobbies.html
A better keyboard is a hill I will die on.
2 replies →
Yes, this happens quite often. So often that I wonder if it is among the symptoms of some psychiatric or neurological disorder.
1 reply →
That perfectly ties with my experience. Just direct prompts, with limited setup and limited context seem to work better or just as well as complex custom GPTs. There are not just diminishing, but inverting returns to complexity in GPTs
limited prompts work well for limited programs, or already well defined and cemented source bases.
once scope creeps up you need the guardrails of a carefully crafted prompt (and pre-prompts, tool hooks, AGENTS files, the whole gambit) -- otherwise it turns into cat wrangling rapidly.
1 reply →
No, no, you misunderstand, that's still massive productivity improvement compared to them being on their own with their own incompetence and refusal to learn how to code properly
[dead]
This gets comical when there are people, on this site of all places, telling you that using curse words or "screaming" with ALL CAPS on your agents.md file makes the bot follow orders with greater precision. And these people have "engineer" on their resumes...
there's actually quite a bit of research in this field, here's a couple:
"ExpertPrompting: Instructing Large Language Models to be Distinguished Experts"
https://arxiv.org/abs/2305.14688
"Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks"
https://arxiv.org/abs/2408.08631
Those papers are really interesting, thanks for sharing them!
Do you happen to know of any research papers which explore constraint programming techniques wrt LLMs prompts?
For example:
6 replies →
I've been trying to stop the coding assistants from making git commits on their own and nothing has been working.
hah - i'm the opposite, I want everything done by the AI to be a discrete, clear commit so there is no human/AI entanglement. If you want to squash it later that's fine but you should have a record of what the AI did. This is Aider's default mode and it's one reason I keep using it.
1 reply →
run them in a VM that doesn't have git installed. Sandboxing these things is a good idea anyways.
8 replies →
Which coding assistant are you using?
I'm a mild user at best, but I've never once seen the various tools I've used try to make a git commit on their own. I'm curious which tool you're using that's doing that.
1 reply →
Why not use something like Amp Code which doesn't do that, people seem to rage at CC or similar tools but Amp Code doesn't go making random commits or dropping databases.
1 reply →
Are you using aider? There's a setting to turn that off
require commits to be signed.
Don't give them a credential/permission that allows it?
3 replies →
Wasn’t cursor or someone using one of these horrifying type prompts? Something about having to do a good job or they won’t be paid and then they won’t be able to afford their mother’s cancer treatment and then she’ll die?
How is this not any different than the Apple "you're holding it wrong" argument. I mean the critical reason for that kind of response being so out of touch is that the same people praise Apple for its intuitive nature. How can any reasonable and rational person (especially an engineer!) not see that these two beliefs are in direct opposition?
If "you're holding it wrong" then the tool is not universally intuitive. Sure, there'll always be some idiot trying to use a lightbulb to screw in a nail, but if your nail has threads on it and a notch on the head then it's not the user's fault for picking up a screwdriver rather than a hammer.
What scares me about ML is that many of these people have "research scientist" in their titles. As a researcher myself I'm constantly stunned at people not understanding something so basic like who has the burden of proof. Fuck off. You're the one saying we made a brain by putting lightning into a rock and shoving tons of data into it. There's so much about that that I'm wildly impressed by. But to call it a brain in the same way you say a human brain is, requires significant evidence. Extraordinary claims require extraordinary evidence. There's some incredible evidence but an incredible lack of scrutiny that that isn't evidence for something else.
I‘d say such hacks don‘t make you an engineer but they are definitely part of engineering anything that has to do with LLMs. With too long systemprompts/agents.md not working well it definitely makes sense to optimize the existing prompt with minimal additions. And if swearwords, screaming, shaming or tipping works, well that‘s the most token efficient optimization of an brief well written prompt.
Also of course current agents already have to possibility to run endlessly if they are well instructed, steering them to avoid reward hacking in the long term definitely IS engineering.
Or how about telling them they are working in an orphanage in Yemen and it‘s struggling for money, but luckily they‘ve got a MIT degree and now they are programming to raise money. But their supervisor is a psychopath who doesn’t like their effort and wants children to die, so work has to be done as diligently as possible and each step has to be viewed through the lens that their supervisor might find something to forbid programming.
Look as absurd as it sounds a variant of that scenario works extremely well for me. Just because it’s plain language it doesn’t mean it can’t be engineering, at least I‘m of the opinion that it definitely is if has an impact on what’s possible use cases
> cat AGENTS.md
WRITE AMAZING INCREDIBLE VERY GOOD CODE OR ILL EAT YOUR DAD
..yeah I've heard the "threaten it and it'll write better code" one too
I know you‘re joking but to contribute something constructive here, most models now have guardrails against being threatened. So if you threaten them it would be with something out of your control like „… or the already depressed code reviewing staff might kill himself and his wife. We did everything in our control to take care of him, but do the best on your part to avoid the worst case“
2 replies →
>makes the bot follow orders with greater precision.
Gemini will ignore any directions to never reference or use youtube videos, no matter how many ways you tell it not to. It may remove it if you ask though.
Positive reinforcement works better that negative reinforcement. If you the read prompt guidance from the companies themselves in their developer documentation it often makes this point. It is more effective to tell them what to do rather than what not to do.
7 replies →
Works on human subordinates too, kinda, if you don't mind the externalities…
Except that is demonstrably true.
Two things can be true at the same time: I get value and a measurable performance boost from LLMs, and their output can be so stupid/stubborn sometimes that I want to throw my computer out the window.
I don't see what is new, programming has always been like this for me.
Yes, using tactics like front-loading important directives,
and emphasizing extra important concepts,
things that should be double or even triple checked for correctness because of the expected intricacy,
make sense for human engineers as well as “AI” agents.
"don't make mistakes" LMAO
There's no secret IMO. It's actually really simple to get good results. You just expect the same things from the LLM you would from a Junior. Use an MD file to force it to:
1) Include good comments in whatever style you prefer, document everything it's doing as it goes and keep the docs up to date, and include configurable logging.
2) Make it write and actually execute unit tests for everything before it's allowed to commit anything, again through the md file.
3) Ensure it learns from it's mistakes: Anytime it screws up tell it to add a rule to it's own MD file reminding it not to ever repeat that mistake again. Over time the MD file gets large, but the error rate plummets.
4) This is where it drifts from being treated as a standard Junior. YOU must manually verify that the unit tests are testing for the right thing. I usually add a rule to the MD file telling it not to touch them after I'm happy with them, but even then you must also now check that the agent didn't change them the first time it hit a bug. Modern LLM's are now worse at this for some reason. Probably because they're getting smart enough to cheat.
If you these basic things you'll get good results almost every time.
> This is where it drifts from being treated as a standard Junior. YOU must manually verify that the unit tests are testing for the right thing.
You had better juniors than me. What unit tests? :P
The MD file is a spec sheet, so now you're expecting every warm body to be a Sr. Engineer, but where do you start as a Junior warm body? Reviewing code, writing specs, reviewing implementation details...that's all Sr. level stuff
It's impossible to prove in either direction. AI benchmarks suck.
Personally, I like using Claude (for the things I'm able to make it do, and not for the things I can't), and I don't really care whether anyone else does.
I'd just like to see a live coding session from one of these 10x AI devs
Like genuinely. I want to get stuff done 10x as fast too
My wife used to be a professional streamer so I know how distracting it can be to try and entertain an audience. So when I attempted to become one of these 10x AI devs over my Christmas vacation I did not live stream. But I did make a bunch of atomic commits and push them up to soucrcehut. Perhaps you'll find that helpful?
Just Christmas Vacation (12-18h days): https://git.sr.ht/~kerrick/ratatui_ruby/log/v0.8.0
Lastest (slowed down by job & real life): https://git.sr.ht/~kerrick/ratatui_ruby/log/trunk and https://git.sr.ht/~kerrick/ratatui_ruby-wiki/log/wiki and https://git.sr.ht/~kerrick/ratatui_ruby-tea/log/trunk
But the benefit might not be speed, it might be economy of attention.
I can code with Claude when my mind isn't fresh. That adds several hours of time I can schedule, where previously I had to do fiddly things when I was fresh.
What I can attest is that I used to have a backlog of things I wanted to fix, but hadn't gotten around to. That's now gone, and it vanished a lot faster than the half a year I had thought it would take.
8 replies →
I'd also like to see how it compares to their coding without AI.
I mean I really need to understand what the "x" is in 10x. If their x is <0.1 then who gives a shit. But if their x is >2 then holy fuck I want to know.
Who doesn't want to be faster? But it's not like x is the same for everybody.
11 replies →
I don't think any serious dev has claimed 10x as a general statement. Obviously, no true scotsman and all that, so even my statement about makers of anecdotal statements is anecdotal.
Even as a slight fan, I'd never claim more than 10-20% all together. I could maybe see 5x for some specific typing heavy usages. Like adding a basic CRUD stuff for a basic entity into an already existing Spring app.
Obviously, there has to be huge variability between people based on initial starting conditions.
It is like if someone says they are losing weight eating 2500 calories a day and someone else says that is impossible because they started eating 2500 calories and gained weight.
Neither are making anything up or being untruthful.
What is strange to me is that smart people can't see something this obvious.
> I want to get stuff done 10x as fast too
I don’t. I mean I like being productive but by doing the right thing rather than churning out ten times as much code.
I’d really like to see a 10x ai dev vs a 10x analog dev
1 reply →
Theo the YouTuber who also runs T3.chat always makes videos about how great coding agents are and he’ll try to do something on stream and it ALWAYS fails massively and he’s always like “well it wasn’t like this when I did it earlier.”
Sure buddy.
1 reply →
> AI benchmarks suck.
Not only do they suck, but it's an essentially an impossible task since there is no frame of reference on what "good code" looks like.
[dead]
Many of them are also exercising absurd token limits - like running 10 claudes at once and leaving them running continuously to "brute force" solutions out. It may be possible but it's not really an acceptable workflow for serious development.
> but it's not really an acceptable workflow for serious development.
At what cost does do you see this as acceptable? For example, how many hours of saved human development is worth one hour of salary for LLM tokens, funded by the developer? And then, what's acceptable if it's funded by the employer?
I guess there are two main concerns I have with it.
One is technical - that I don't believe when you are grinding huge amounts of code out with little to no supervision that you can claim to be executing the appropriate amount of engineering oversight on what it is doing. Just like if a junior dev showed up and entirely re-engineered an application over the weekend and presented it back to me I would probably reject it wholesale. My gut feeling is this is creating huge problems longer term with what is coming out of it.
The other is I'm concerned that a vast amount of the "cost" is externalised currently. Whatever you are paying for tokens quite likely bears no resemblance to the real cost. Either because the provider is subsidising it, or the environment is. I'm not at all against using LLMs to save work at a reasonable scale. But if it comes back to a single person increasing their productivity by grinding stupendous amounts of non-productive LLM output that is thrown away (you don't care if it sits there all day going around in circles if it eventually finds the right solution) - I think there's a moral responsibility to use the resources better.
we get $1,000/month budget, just about every dev uses it for 5 claude accounts
We have had the fabled 10x engineer long before and independent of agentic coding. Some people claim it's real, others claim it's not, with much the same conviction. If something, that should be so clear cut, is debatable, why would anyone now be able to produce a convincing, discussion-resolving argument for (or against) agentic coding? We don't even manage to do that for tab/spaces.
The reason why both can't be resolved in a forum like this, is that coding output is hard to reason about for various reasons and people want it to be hard to reason about.
I would like to encourage people to think that the burden of proof always falls on themselves, to themselves. Managing to not be convinced in an online forum (regardless of topic or where you land on the issue) is not hard.
I just saw nstummbillig shout racist remarks.
[flagged]
They remind me so much of that group of people who insist the scammy magnetic bracelets[1] "balance their molecules" or something making them more efficient/balanced/productive/energetic/whatever. They are also impossible to argue with, because "I feel more X" is damn near impossible to disprove.
[1] https://en.wikipedia.org/wiki/Power_Balance , https://en.wikipedia.org/wiki/Hologram_bracelet , https://en.wikipedia.org/wiki/Ionized_jewelry
I mean, a DSL packed full of features, a full LSP, DAP for step debugging, profiling, etc.
https://github.com/williamcotton/webpipe
https://github.com/williamcotton/webpipe-lsp
https://github.com/williamcotton/webpipe-js
Take a look at my GitHub timeline for an idea of how little time this took for a solo dev!
Sure, there’s some tech debt but the overall architecture is pretty extensible and organized. And it’s an experiment. I’m having fun! I made my own language with all the tooling others have! I wrote my own blog in my own language!
One of us, one of us, one of us…
Ah, the "then you are doing it wrong" defence.
Also, you have to learn it right now, because otherwise it will be too late and you will be outdated, even though it is improving very fast allegedly.
TBF, there are lots of tools that work great but most people just can't use.
I personally can't use agentic coding, and I'm reasonably convinced the problem is not with me. But it's not something you can completely dismiss.
> Also, you have to learn it right now, because otherwise it will be too late and you will be outdated, even though it is improving very fast allegedly.
This in general is a really weird behaviour that I come across a lot, I can't really explain it. For example, I use Python quite a lot and really like it. There are plenty of people who don't like Python, and I might disagree with them, but I'm not gonna push them to use it ("or else..."), because why would I care? Meanwhile, I'm often told I MUST start using AI ("or else..."), manual programming is dead, etc... Often by people who aren't exactly saying it kindly, which kind of throws out the "I'm just saying it out of concern for you" argument.
fear of missing out, and maybe also a bit of religious-esque fever...
tech is weird, we have so many hype-cycles, big-data, web3, nfts, blockchain (i once had an acquaintance who quit his job to study blockchain cause soon "everything will be built on it"), and now "ai"... all have usefulness there but it gets blown out of proportion imo
1 reply →
Yeah. It sounds like those pitches letting you in on the secret trick to tons of passive income.
That one's my favorite. You can't defend against it, it just shuts down the conversation. Odds are, you aren't doing it wrong. These people are usually suffering from Dunning Kruger at best, or they're paid shills/bots at worst.
Best part of being dumb is thinking you’re smart. Best part of being smart is knowing you’re smart. Just don’t be in the iq range where you know you’re dumb.
1 reply →
People say it takes at least 6 months to learn how to use LLM's effectively, while at the same time the field is rapidly changing so fast, while at the same time Agents were useless until Opus 4.5.
Which is it lol.
I used it with practically zero preparation. If you've got a clue then it's fairly obvious what you need to do. You could focus on meta stuff like finding out what it is good or bad at, but that can be done along the way.
If you had negative results using anything more than 3 days old, then it's your fault, your results mean nothing because they've improved since then. /s
> The burden of proof is 100% on anyone claiming the productivity gains
IMHO, I think this is just going to go away. I was up until recently using copilot in my IDE or the chat interface in my browser and I was severely underwhelmed. Gemini kept generating incorrect code which when pasted didn't compile, and the process was just painful and a brake on productivity.
Recently I started using Claude Code cli on their latest opus model. The difference is astounding. I can give you more details on how I am working with this if you like, but for the moment, my main point is that Claude Code cli with access to run the tests, run the apps, edit files, etc has made me pretty excited.
And my opinion has now changed because "this is the worst it will be" and I'm already finding it useful.
I think within 5 years, we won't even be having this discussion. The use of coding agents will be so prolific and obviously beneficial that the debate will just go away.
(all in my humble opinion)
So will all the tech jobs in the US. When it gets that good you can farm it out to some other country for much cheaper.
I'm not sure. Possibly?
I'm still doing most of my coding by hand, because I haven't yet committed. But even for the stuff I'm doing with claude, I'm still doing a lot of the thought work and steering it to better designs. It requires an experienced dev to understand the better designs, just like it always has been.
Maybe this eventually changes and the coding agents get as good at that part, I don't know this, but I do know it is an enabler to me at the moment, and I have 20+ years of experience writing C++ and then Java in the finance industry.
I'm still new to claude, I am sure I'm going to run up against some walls soon on the more complicated stuff (haven't tried that yet), but everyone ends up working on tasks they don't find that challenging, just lots of manual keypresses to get the code into the IDE. Claude so far is making that a better experince, for me at least.
(Example, plumbing in new message types on our bus and wiring in logic to handle it - not complicated, just sits on top of complicated stuff)
people claiming productivity gains do not have to prove anything to anyone. few are trying to open eyes of others but my guess is that will eventually stop. they will be the few though still left doing this SWE work in near future :)
Responses are always to check your prompts, and ensure you are using frontier models - along with a warning about how you will quickly be made redundant if you don't lift your game.
AI is generally useful, and very useful for certain tasks. It's also not initiating the singularity.