← Back to context

Comment by indiv0

11 hours ago

This thread will become a typical "haha slop company made slop" but I've been bitten by a bug exactly like this before in a (pre-AI, artisan) OSS project. The maintainer there didn't properly account for DST when calculating last backup time, so the app started and never stopped writing/re-writing backups continuously.

Perhaps the framing shouldn't be "haha slop" but rather why doesn't the AI write better quality software than we do? To which the answer is obvious IMO -- even emergent properties can't elevate AI intelligence too far above the training dataset. So how do we get to superintelligent (or at least "not-wreck-your-NVMe-endurance-telligent") AI, if we, as a whole, are not smart enough ourselves?

Judge not the slop-bot, lest ye be judged yourself, engineer.

We've gone from "you're holding it wrong" to "the training data was bad because humans suck too". Difference is, humans learn from their mistakes.

  • A singular human does (or tends to). Humans as a group, where members join and leave a group with time, also do learn, but at a much slower pace - over the years to decades timeframe. "X things programmers should know about Y" is a template for quite a few very influential blog posts, yet for most of them, you find many programmers, even decades later, who don't actually know what they "should".

    My experience was always that 90% of code is ugly and clunky. I'm not at all surprised, while reviewing AI-generated code, to see many of the same ugliness we regularly commit. The quality of the output code is now consistently average, which means it's basically shit in 90% of cases, but it tends to mostly work (in the general case). The same kind of shit I've seen people push to production thousands of times in my career.

    We don't fully know how to write good code. We don't really understand what good code should objectively look like. Spending more time on code doesn't automatically lead to better code (but costs a lot more). Above all, we don't need good code - the business side is perfectly fine with "good enough right now" rather than "maybe a lot better half a year from now". And that's what the models are trained on. They would, indeed, need quite a lot of "emergent properties" to go from that to consistently good code. ASI-level properties, I suspect.

  • > Difference is, humans learn from their mistakes.

    Great! So next time the human will prompt the agent to watch out for and avoid this bug.

    • > Great! So next time the human will prompt the agent to watch out for and avoid this bug.

      I actually created a system for something like this. The basic idea is, once you have identified what the issue was and fixed it, you can create lessons that lives inside the repository. Lessons are designed to be mapped to one or more files so if the LLM changes the files again, they can see what the issue was.

      The main challenge is being able to summarize and create proper tags so the AI after any code change can easily find the lesson.

Lack of accountability is the cause here. People don't think before hitting the 'Publish' button. Their managers let them off the hook because the culture still allows making egregious mistakes, as long as there's an LLM to blame.

1. I bet that developer only made that mistake one time in their life. Humans learn from their mistakes, LLMs don't. If you rely on LLMs to generate all of your code, you can expect to run into the same issues again and again.

2. "One developer somewhere in the world made a bad mistake one time, so this represents the quality of all software devs everywhere". Maybe they were just a bad developer? Bad developers exist. I have never written a bug that has destroyed my users' hardware, and I think that writing such a bug is completely inexcusable in an enterprise environment with software that will be shipped to millions of users, as Codex is.

  • LLMs do learn from mistakes. Not as directly from individual mistakes like humans do, but in aggregate the models have improved much more in the last year than most humans I know learn in the same time.

    • I don't like the reframing of 'learning from mistakes' from a human-like, near instantaneous feedback loop, to a year-long process of retraining on many traces collected from user data. They're different concepts and we should refer to them using different phrasing.

    • How many more times do I have to add variations of ”do not run any commands for the application without first entering the running container at `docker compose …`” to my AGENTS.md before it learns that node and phpunit is not available outside these containers?

  • > I have never written a bug that has destroyed my users' hardware, ...

    Probably whoever (human or agent) originally decided to put TRACE logs into SQLite also thought---or reasoned---so. Maybe the decision was right at that time but the amount of TRACE logs have increased enormously. You will never know.

    • I love that we've moved the goalposts from "LLMs are better than artisanal software engineers" to "actually, shipping hardware-destroying bugs in production is literally unavoidable, nobody could possibly avoid doing it".

      1 reply →

What are your thoughts on the SNR of the linked GitHub issue threads? Consider the volume of comments posted and the substance of each comment.

  • I read the first page and they were excellent. Each was clearly written by an experienced dev who knows how to substantiate their claims and propose an acceptable fix that could just be merged.

    Your comment, on the other hand, would be improved by including your own opinion on the matter.

    • > Each was clearly written by an experienced dev

      /s?

      They're clearly AI generated