← Back to context

Comment by input_sh

21 hours ago

Even speaking from a pure statistical perspective, it is quite literally impossible for "AI" that outputs world's-most-average-answer to be better than "most engineers".

In fact, it's pretty easy to conclude what percentage of engineers it's better than: all it does is it consumes as much data as possible and returns the statistically most probable answer, therefore it's gonna be better than roughly 50% of engineers. Maybe you can claim that it's better than 60% of engineers because bottom-of-the-barrel engineers tend to not publish their works online for it to be used as training data, but for every one of those you have a bunch of non-engineers that don't do this for a living putting their shitty attempts at getting stuff done using code online, so I'm actually gonna correct myself immediately and say that it's about 40%.

The same goes for every other output: it's gonna make the world's most average article, the most average song in a genre and so on. You can nudge it to be slightly better than the average with great effort, but no, you absolutely cannot make it better than most.

Which indicates something unknown. Code quality evaluations in training. Do you know if there is any sort of code quality evaluation for the training data? I think the argument is a little reductive without knowing the actual details of the model training input pipeline and the stages of generating the output on that same dimension, but I don't really have any concrete knowledge here either, so your baseline assumption could be right.

The thing that separates AI Agents from normal programmers is that agents don't get bored or tired.

For most engineers the ability might be there, but the motivation or willingness to write, for example, 20 different test cases checking the 3 line bug you just fixed is fixed FOR SURE usually isn't there. You add maybe 1-2 tests because they're annoying boilerplate crap to write and create the PR. CI passes, you added new tests, someone will approve it. (Yes, your specific company is of course better than this and requires rigorous testing, but the vast majority isn't. Most don't even add the two tests as long as the issue is fixed.)

An AI Agent will happily and without complaining use Red/Green TDD on the issue, create the 20 tests first, make sure they fail (as they should), fix the issue and then again check that all tests pass. And it'll do it in 30 minutes while you do something else.

This is kind of like saying a kid can never become a better programmer than the average of his teachers.

IMHO, the reasons not to use AI are social, not logical.

  • The kid can learn and become better over time, while "AI" can only be retrained using better training data.

    I'm not against using AI by any means, but I know what to use it for: for stuff where I can only do a worse than half the population because I can't be bothered to learn it properly. I don't want to toot my own horn, but I'd say I'm definitely better at my niche than 50% of the people. There are plenty of other niches where I'm not.

    • Yeah, but it's been trained on the boring, repetitive stuff, and A LOT of code that needs to be written is just boring, repetitive stuff.

      By leaving the busywork for the drones, this frees up time for the mind to solve the interesting and unsolved problems.

  • The AI doesn't know what good or bad code is. It doesn't know what surpassing someone means. It's been trained to generate text similar to its training data, and that's what it does.

    If you feed it only good code, we'd expect a better result, but currently we're feeding it average code. The cost to evaluate code quality for the huge data set is too high.

    • The training data includes plenty of examples of labelled good and bad code. And comparisons between two implementations plus trade-offs and costs and benefits. I think it absolutely does "know" good code, in the sense that it can know anything at all.

      1 reply →

> Maybe you can claim that it's better than 60% of engineers because bottom-of-the-barrel engineers tend to not publish their works online for it to be used as training data, but for every one of those you have a bunch of non-engineers that don't do this for a living putting their shitty attempts at getting stuff done using code online, so I'm actually gonna correct myself immediately and say that it's about 40%.

And there are a bunch of engineers from certain cultures who don't know what they don't know, but believe that a massive portfolio of slop is better than one or two well-developed projects.

I can only hope that the people training the good coding models know to tell AI that these are antipatterns, not patterns.

>> Even speaking from a pure statistical perspective, it is quite literally impossible for "AI" that outputs world's-most-average-answer to be better than "most engineers". In fact, it's pretty easy to conclude what percentage of engineers it's better than: all it does is it consumes as much data as possible and returns the statistically most probable answer

Yeah, you come across as someone who thinks that the AI simply spits out the average of the code in its training data. I don't think that understanding is accurate, to say the least.