Comment by candiddevmike

3 months ago

I don't understand why generative AI gets a pass at constantly being wrong, but an average worker would be fired if they performed the same way. If a manager needed to constantly correct you or double check your work, you'd be out. Why are we lowering the bar for generative AI?

36 comments

candiddevmike

latchup 3 months ago

Multiple reasons:

* Gen AI never disagrees with or objects to boss's ideas, even if they are bad or harmful to the company or others. In fact, it always praises them no matter what. Brenda, being a well-intentioned human being, might object to bad or immoral ideas to prevent harm. Since boss's ego is too fragile to accept criticism, he prefers gen AI.

* Boss is usually not qualified, willing, or free to do Brenda's job to the same quality standard as Brenda. This compels him to pay Brenda and treat her with basic decency, which is a nuisance. Gen AI does not demand fair or decent treatment and (at least for now) is cheaper than Brenda. It can work at any time and under conditions Brenda refuses to. So boss prefers gen AI.

* Brenda takes accountability for and pride in her work, making sure it is of high quality and as free of errors as she can manage. This is wasteful: boss only needs output that is good enough to make it someone else's problem, and as fast as possible. This is exactly what gen AI gives him, so boss prefers gen AI.

bigfishrunning 3 months ago

Your third point is especially poignant. It points out that AI doesn't just take a job here, but it makes everything worse.
I wish I could upvote this comment twice.
thewarrior 3 months ago

You’re absolutely right !

basscomm 3 months ago

My kneejerk reaction is the sunk cost fallacy (AI is expensive), but I'm pretty sure it's actually because businesses have spent the last couple of decades doing absolutely everything they can to automate as many humans out of the workforce as possible.

martin-t 3 months ago

Because it's much cheaper.

So now you don't have to pay people to do their actual work, you assign the work to ML ("AI") and then pay the people to check what it generated. That's a very different task, menial and boring, but if it produces more value for the same amount of input money, then it's economical to do so.

And since checking the output is often a lower skilled job, you can even pay the people less, pocketing more as an owner.

anon721656321 3 months ago

If a worker could be right 50% of the time and get paid 1 cent to write a 5000 word essay on a random topic, and do it in less than 30 seconds.

Then I think managers would be fine hiring that worker for that rate as well.

cryptonym 3 months ago
5000 half-right words is worthless output. That can even lead to negative productivity.
- dr-detroit 3 months ago
  
  [dead]
hitarpetar 3 months ago

great, now who are you paying to sort the right output from the wrong output?

Levitz 3 months ago

There's a variety of reasons.

You don't have a human to manage. The relationship is completely one-sided, you can query a generative AI at 3 in the morning on new years eve. This entity has no emotions to manage and no own interests.

There's cost.

There's an implicit promise of improvement over time.

There's an the domain of expertise being inhumanly wide. You can ask about cookies right now, then about XII century France, then about biochemistry.

The fact that an average worker would be fired if they perform the same way is what the human actually competes with. They have responsibility, which is not something AI can offer. If it was the case that, say, Anthropic, actually signed contracts stating that they are liable for any mistakes, then humans would be absolutely toast.

ryandrake 3 months ago

I've been trying to open my mind and "give AI a chance" lately. I spent all day yesterday struggling with Claude Code's utter incompetence. It behaves worse than any junior engineer I've ever worked with:

- It says it's done when its code does not even work, sometimes when it does not even compile.

- When asked to fix a bug, it confidently declares victory without actually having fixed the bug.

- It gets into this mode where, when it doesn't know what to do, it just tries random things over and over, each time confidently telling me "Perfect! I found the error!" and then waiting for the inevitable response from me: "No, you didn't. Revert that change".

- Only when you give it explicit, detailed commands, "modify fade_output to be -90," will it actually produce decent results, but by the time I get to that level of detail, I might as well be writing the code myself.

To top it off, unlike the junior engineer, Claude never learns from its mistakes. It makes the same ones over and over and over, even if you include "don't make XYZ mistake" in the prompt. If I were an eng manager, Claude would be on a PIP.

sswatson 3 months ago
Recently I've used Claude Code to build a couple TUIs that I've wanted for a long time but couldn't justify the time investment to write myself.
My experience is that I think of a new feature I want, I take a minute or so to explain it to Claude, press enter, and go off and do something else. When I come back in a few minutes, the desired feature has been implemented correctly with reasonable design choices. I'm not saying this happens most of the time, I'm saying it happens every time. Claude makes mistakes but corrects them before coming to rest. (Often my taste will differ from Claude's slightly, so I'll ask for some tweaks, but that's it.)
The takeaway I'm suggesting is that not everyone has the same experience when it comes to getting useful results from Claude. Presumably it depends on what you're asking for, how you ask, the size of the codebase, how the context is structured, etc.
- hunterpayne 3 months ago
  
  Its great for demos, its lousy for production code. The different cost of errors in these two use cases explains (almost) everything about the suitability of AI for various coding tasks. If you are the only one who will ever run it, its a demo. If you expect others to use it, its not.
  
  2 replies →
simonw 3 months ago
Learning to use Claude Code (and similar coding agents) effectively takes quite a lot of work.
Did you have it creating and running automated tests as it worked?
- 9rx 3 months ago
  
  > Learning to use Claude Code (and similar coding agents) effectively takes quite a lot of work.
  I've tried to put in the work. I can even get it working well for a while. But then all of a sudden it is like the model suffers a massive blow to the head and can't produce anything coherent anymore. Then it is back to the drawing board, trying all over again.
  It is exhausting. The promise of what it could be is really tempting fruit, but I am at the point that I can't find the value. The cost of my time to put in the work is not being multiplied in return.
  > Did you have it creating and running automated tests as it worked?
  Yes. I work in a professional capacity. This is a necessity regardless of who (or what) is producing the product.
yfontana 3 months ago
> - It says it's done when its code does not even work, sometimes when it does not even compile.
> - When asked to fix a bug, it confidently declares victory without actually having fixed the bug.
You need to give it ways to validate its work. A junior dev will also give you code that doesn't compile or should have fixed a bug but doesn't if they don't actually compile the code and test that the bug is truly fixed.
- ryandrake 3 months ago
  
  Believe me, I've tried that, too. Even after giving detailed instructions on how to validate its work, it often fails to do it, or it follows those instructions and still gets it wrong.
  Don't get me wrong: Claude seems to be very useful if it's on a well-trodden train track and never has to go off the tracks. But it struggles when its output is incorrect.
  The worst behavior is this "try things over and over" behavior, which is also very common among junior developers and is one of the habits I try to break from real humans, too. I've gone so far as to put into the root CLAUDE.md system prompt:
  --NEVER-- try fixes that you are not sure will work.
  --ALWAYS-- prove that something is expected to work and is the correct fix, before implementing it, and then verify the expected output after applying the fix.
  ...which is a fundamental thing I'd ask of a real software engineer, too. Problem is, as an LLM, it's just spitting out probabilistic sentences: it is always 100% confident of its next few words. Which makes it a poor investigator.
hitarpetar 3 months ago

yOu'Re HoLdInG iT wRoNg

amscanne 3 months ago

It’s much cheaper than Brenda (superficially, at least). I’m not sure a worker that costs a few dollars a day would be fired, especially given the occasional brilliance they exhibit.

BeFlatXIII 3 months ago

How much compute costs is it for the AI to do Brenda's job? Not total AI spend, but the fraction that replaced Brenda. That's why they'd fire a human but keep using the AI.

simonw 3 months ago

Brenda has been kissed on her forehead by the Excel goddess herself. She is irreplaceable.
(More seriously, she also has 20+ years of institutional knowledge about how the company works, none of which has ever been captured anywhere else.)
Covenant0028 3 months ago

Brenda's job involves being accountable for the output. In many types of jobs, posting false numbers would render her liable for a dismissal, lawsuit, or even jail.
I'd like to see the cost of a model where the model provider (Anthropic etc) can assume that kind of financial and legal accountability.
To the extent that this output is possible only when Anthropic is not held to the same standard as Brenda, we will need to conclude that the cost savings accrue due to the reduced liability standards than on the technical capabilities of the model
mrgoldenbrown 3 months ago

It's not just compute, its also the setup costs - How much did you have to pay someone to feed the AI Brenda's decades of knowledge specific to her company and all the little special cases of how it does business.

Esophagus4 3 months ago

Because it doesn’t have to be as accurate as a human to be a helpful tool.

That is precisely why we have humans in the loop for so many AI applications.

If [AI + human reviewer to correct it] is some multiple more efficient than [human alone], there is still plenty of value.

bigstrat2003 3 months ago
> Because it doesn’t have to be as accurate as a human to be a helpful tool.
I disagree. If something can't be as accurate as a (good) human, then it's useless to me. I'll just ask the human instead, because I know that the human is going to be worth listening to.
- Esophagus4 3 months ago
  
  Autopilot in airplanes is a good example to disprove that.
  Good in most conditions. Not as good as a human. Which is why we still have skilled pilots flying planes, assisted by autopilot.
  We don’t say “it’s not as good as a human, so stuff it.”
  We say, “it’s great in most conditions. And humans are trained how to leverage it effectively and trained to fly when it cannot be used.”
  
  5 replies →

danans 3 months ago

> Why are we lowering the bar for generative AI?

Because it doesn't need to sleep or spend time with its family.

Covenant0028 3 months ago

Gen AI doesn't just get a pass at being wrong. It gets a pass for everything.

Look at Grok. If a human employee went around sexually harassing their CEO in public and giving themselves a Hitler nickname, they'd be fired immediately and have criminal charges. In the case of Grok, the CEO had to quit the company after being sexually harassed.

We've not lowered the bar for AI, we've removed it entirely.