← Back to context

Comment by wickedsight

6 days ago

> The polarization comes from the very disparate coding experiences and output quality that different people find when using these tools.

Not just when using tools, also when using humans. The frame of reference of what is considered 'production code' differs immensely between organizations, teams and people. The code I get from LLM's is usually much better than what I get from my peers. Maybe not one shot, but after some steering it gets there.

It also isn't lazy. When generating test cases for relatively simple pieces of code, it usually tests pretty much every path and doesn't stop right at the 80% code coverage quality gate.

I can imagine if you're at the level of Linus or something, you might conclude differently, but most people aren't there at all.

> The frame of reference of what is considered 'production code' differs immensely between organizations, teams and people.

I think it’s really down to this. Nobody can agree on what counts as production-quality code. I remember joining a company with what I think (hope) most of us would call horrible quality code. It was an absolute mess, barely compiled with hundreds of warnings, and had uncountable number of bugs. They didn’t even have a bug tracker so nobody even knew how many they had.

But the people working there already were so proud of it! None of them had ever worked for another company so they had no idea how bad their code was in comparison with the rest of the software industry (which itself is a very low bar). I told the founder we had a huge code quality problem and he looked at me like I had horns growing out of my head.

When someone says their LLM is producing “production-quality” code, actually look at it and see. Arguing about it on HN is pointless because everyone’s quality bar is different.

Absolutely! I find its test generation, properly steered, to be top notch. In many ways it's like having a second head, because it'll spontaneously come up with test paths that I'd normally only get to after a month or so in one of my "aha! What about XYZ?" shower thoughts.

You'll also notice that Linus doesn't poo-poo AI at all. His only gripe is with people using it wrong, such as flooding security lists with drive-by security reports after pointing their agent to the code and saying "find me some VULNS!!1!1!!"

> The code I get from LLM's is usually much better than what I get from my peers

Then you should seriously question for who you're working for imo.

> It also isn't lazy.

It is indeed lazy in my experience, as in being overly zealous when creating useless test cases and ignoring the important ones. I don't want it to test a sum I want to know a test that can "guarantee" me that a further change doesn't break existing code. And producing this high quality in tests is HARD, and requires a lot of steering with agents. This culture of tests code coverage is just wrong, the best code base I worked with had code coverage only on the net percent of code that matters, the rest is covered by for static type checking and integration tests