Comment by drewbug01
1 day ago
> If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.
I struggle to take comments like this seriously - yes, it is very reasonable to expect these magical tools to copy and paste something without alterations. How on earth is that an unreasonable ask?
The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
What, precisely, are they good for?
I think what they're best at right now is the initial scaffolding work of projects. A lot of the annoying bootstrap shit that I hate doing is actually generally handled really well by Codex.
I agree that there's definitely some overhype to them right now. At least for the stuff I've done they have gotten considerably better though, to a point where the code it generates is often usable, if sub-optimal.
For example, about three years ago, I was trying to get ChatGPT to write me a C program to do a fairly basic ZeroMQ program. It generated something that looked correct, but it would crash pretty much immediately, because it kept trying to use a pointer after free.
I tried the same thing again with Codex about a week ago, and it worked out of the box, and I was even able to get it to do more stuff.
I think it USED to be true that you couldn't really use an LLM on a large, existing codebase. Our codebase is about 2 million LOC, and a year ago you couldn't use an LLM on it for anything but occasional small tasks. Now, probably 90% of the code I commit each week was written by Claude (and reviewed by me and other humans - and also by Copilot and ZeroPath).
It seems like just such a weird and rigid way to evaluate it? I am a somewhat reasonable human coder, but I can't copy and paste a bunch of code without alterations from memory either. Can someone still find a use for me?
For a long time, I've wanted to write a blog post on why programmers don't understand the utility of LLMs[1], whereas non-programmers easily see it. But I struggle to articulate it well.
The gist is this: Programmers view computers as deterministic. They can't tolerate a tool that behaves differently from run to run. They have a very binary view of the world: If it can't satisfy this "basic" requirement, it's crap.
Programmers have made their career (and possibly life) being experts at solving problems that greatly benefit from determinism. A problem that doesn't - well either that needs to be solved by sophisticated machine learning, or by a human. They're trained on essentially ignoring those problems - it's not their expertise.
And so they get really thrown off when people use computers in a nondeterministic way to solve a deterministic problem.
For everyone else, the world, and its solutions, are mostly non-deterministic. When they solve a problem, or when they pay people to solve a problem, the guarantees are much lower. They don't expect perfection every time.
When a normal human asks a programmer to make a change, they understand that communication is lossy, and even if it isn't, programmers make mistakes.
Using a tool like an LLM is like any other tool. Or like asking any other human to do something.
For programmers, it's a cardinal sin if the tool is unpredictable. So they dismiss it. For everyone else, it's just another tool. They embrace it.
[1] This, of course, is changing as they become better at coding.
My problem isn't lack of determinism, it's that it's solution frequently has basic errors that prevent it from working. I asked ChatGPT for a program to remove the background of an image. The resulting image was blue. When I pointed this out to ChatGPT it identified this as a common error in RGB ordering in OpenCV and told me the code to change. The whole process did not take very long, but this is not a cycle that is anything I want to be part of. (That, and it does not help me much to give me a basic usage of OpenCV that does not work for the complex background I wanted to remove)
Then there are the cases where I just cannot get it do what I ask. Ask Gemini to remove the background of an image and you get a JPEG with a backed in checkerboard background, even when you tell it to produce an RGBA PNG. Again, I don't have any use for that.
But it does know a lot of things, and sometimes it informs me of solutions I was not aware of. The code isn't great, but if I were non-technical (or not very good), this would be fantastic and better than I could do.
I’m perfectly happy for my tooling to not be deterministic. I’m not happy for it to make up solutions that don’t exist, and get stuck in loops because of that.
I use LLMs, I code with a mix of antigravity and Claude code depending on the task, but I feel like I’m living in a different reality when the code I get out of these tools _regularly just doesn’t work, at all_. And to the parents point, I’m doing something wrong for noticing that?
If it were terrible, you wouldn't use them, right? Isn't the fact that you continue to use AI coding tools a sign that you find them a net positive? Or is it being imposed on you?
> And to the parents point, I’m doing something wrong for noticing that?
There's nothing wrong pointing out your experience. What the OP was implying was he expects them to be able to copy/paste reliably almost 100% of the time, and not hallucinate. I was merely pointing out that he'll never get that with LLMs, and that their inability to do so isn't a barrier to getting productive use out of them.
2 replies →
> And so they get really thrown off when people use computers in a nondeterministic way to solve a deterministic problem
Ah, no. This is wildly off the mark, but I think a lot of people don't understand what SWEs actually do.
We don't get paid to write code. We get paid to solve problems. We're knowledge workers like lawyers or doctors or other engineers, meaning we're the ones making the judgement calls and making the technical decisions.
In my current job, I tell my boss what I'm going to be working on, not the other way around. That's not always true, but it's mostly true for most SWEs.
The flip side of that is I'm also held responsible. If I write ass code and deploy it to prod, it's my ass that's gonna get paged for it. If I take prod down and cause a major incident, the blame comes to me. It's not hard to come up with scenarios where your bad choices end up costing the company enormous sums of money. Millions of dollars for large companies. Fines.
So no, it has nothing to do with non-determinism lol. We deal with that all the time. (Machine learning is decades old, after all.)
It's evaluating things, weighing the benefits against the risks and failure modes, and making a judgement call that it's ass.
> What, precisely, are they good for?
scamming people
Also good for manufacturing consent in Reddit and other places. Intelligence services busy with certain country now, bots using LLMs to pump out insane amounts of content to mold the information atmosphere.
Its strong enough to replace humans at their jobs and weak enough that it cant do basic things. Its a paradox. Just learn to be productive with them. Pay $200/month and work around with its little quirks. /s