Comment by MeetingsBrowser

4 hours ago

You (in theory) have more control over the quality of the team you are managing, than the quality of the models you are using.

And the quality of code models puts out is, in general, well below the average output of a professional developer.

It is however much faster, which makes the gambling loop feel better. Buying and holding a stock for a few months doesn't feel the same as playing a slot machine.

11 comments

MeetingsBrowser

PaulHoule 3 hours ago

One difference is those developers are moral subjects who feel bad if they screw up whereas a computer is not a moral subject and can never be held accountable.

https://simonwillison.net/2025/Feb/3/a-computer-can-never-be...

ponector 2 hours ago

Right, you need to hire a scapegoat. Usually tester has that role: little impact but huge responsibility for quality.

est31 3 hours ago

You have a lot of control over LLM quality. There is different models available. Even with different effort settings of those models you have different outcomes.

E.g. look at the "SWE-Bench Pro (public)" heading in this page: https://openai.com/index/introducing-gpt-5-4/ , showing reasoning efforts from none to high.

Of course, they don't learn like humans so you can't do the trick of hiring someone less senior but with great potential and then mentor them. Instead it's more of an up front price you have to pay. The top models at the highest settings obviously form a ceiling though.

kraemahz 3 hours ago

You also have control over the workflow they follow and the standards you expect them to stick through, through multiple layers of context. Expecting a model to understand your workflow and standards without doing the effort of writing them down is like expecting a new hire to know them without any onboarding. Allowing bad AI code into your production pipeline is a skill issue.
MeetingsBrowser 1 hour ago

Imagine you opened a job posting and had all applicants complete SWE-bench.
Ignoring the useless/unqualified candidates and models, human applicants have a much wider range of talent for you to choose from than the top models + tooling.
The frontier models + tooling are, in the grand scheme of things, basically equivalent at any given moment.
Humans can be just as bad as the worst models, but models are no where near as good as the best humans.

tossandthrow 3 hours ago

What theory is that?

My experience is the absolute opposite. I am much more in control of quality with Ai agents.

I am never letting junior to midlevels into my team again.

In fact, I am not sure I will allow any form of manual programming in a year or so.

MeetingsBrowser 3 hours ago
> I am never letting junior to midlevels into my team again
Exactly. You control the quality of the people in your team. You can train, fire, hire, etc until you get the skill level you want.
You have effectively no control over the quality of the output from an LLM. You get what the frontier labs give you and must work with that.
- tossandthrow 2 hours ago
  
  That is not correct.
  It is much easier to control quality of an Ai than of inexperienced developers.
  
  1 reply →
DrJokepu 3 hours ago
Eh. You want a good mix of experience levels, what really matters is everyone should be talented. Less experienced colleagues are unburdened by yesterday’s lessons that may no longer be relevant today, they don’t have the same blind spots.
Also, our profession is doomed if we won’t give less experienced colleagues a chance to shine.
- tossandthrow 2 hours ago
  
  Our profession is likely doomed not because we don't train people, but by the lack of demand