Comment by fauigerzigerk

1 day ago

I agree that it's ultimately about the product.

But here's the problem. Five years ago, when someone on here said, "I wrote this non-trivial software", the implication was that a highly motivated and competent software engineer put a lot of effort into making sure that the project meets a reasonable standard of quality and will probably put some effort into maintaining the project.

Today, it does not necessarily imply that. We just don't know.

Even with LLMs delivering software that consistently works requires quite a bit of work and in most cases requires certain level of expertise. Humans also write quite a bit of garbage code.

People using LLMs to code these days is similar to how majority people stopped using assembly and moved to C and C++, then to garbage collected languages and dynamically typed languages. People were always looking for ways to make programmers more productive.

Programming is evolving. LLMs are just next generation programming tools. They make programmers more productive and in majority of the cases people and companies are going to use them more and more.

  • I'm not opposed to AI generated code in principle.

    I'm just saying that we don't know how much effort was put into making this and we don't know whether it works.

    The existence of a repository containing hundereds of files, thousands of SLOCs and a folder full of tests tells us less today than it used to.

    There's one thing in particular that I find quite astonishing sometimes. I don't know about this particular project, but some people use LLMs to generate both the implementation and the test cases.

    What does that mean? The test cases are supposed to be the formal specification of our requirements. If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?

    • I fully agree with your overall message and sentiment. But let me be nit-picky for a moment.

      > The test cases are supposed to be the formal specification of our requirements

      Formal methods folks would strongly disagree with this statement. Tests are informal specifications in the sense that they don't provide a formal (mathematically rigorous) description of the full expected behavior of the system. Instead, they offer a mere glimpse into what we hope the system would do.

      And that's an important part, which is where your main point stands. The test is what confirms that the thing the LLM built conforms to the cases the human expected to behave in a certain way. That's why the human needs to provide them.

      (The human could take help of an LLM to write the tests, as in they give an even-more-informal natural language description of what the test should do. But the human then needs to make sure that the test really does that and maybe fill in some gaps.)

    • > If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?

      You don’t. That’s the scary part. Up until now, this was somewhat solved by injecting artificial friction. A bank that takes 5 days for a payment to clear. And so on.

      But it’s worse than this, because most problems software solves cannot even be understood until you partially solve the problem. It’s the trying and failing that reveals the gap, usually by someone who only recognizes the gap because they were once embarrassed by it, and what they hear rhymes with their pain. AI doesn’t interface with physical reality, as far as we know, or have any mechanism to course correct like embarrassment or pain.

      In the future, we will have flown off the cliff before we even know there was a problem. We will be on a space ship going so fast that we can’t see the asteroid until it’s too la...

> Today we just don’t know

You never knew. There are plenty of intelligent, well-intentioned software engineers that publish FOSS that is buggy and doesn’t meet some arbitrary quality standards.

the implication was that a highly motivated and competent software engineer put a lot of effort into making sure that the project meets a reasonable standard of quality and will probably put some effort into maintaining the project

That is entirely an assumption on the part of the reader. Nothing about someone saying "I built this complicated thing!" implies competence, or any desire to maintain it beyond building it.

The problem you're facing is survivorship bias. You can think of lots of examples of where that has happened, and very few where it hasn't, because when the author of the project is incompetent or unmotivated the project doesn't last long enough for you to hear about it twice.

  • >Nothing about someone saying "I built this complicated thing!" implies competence, or any desire to maintain it beyond building it.

    I disagree. The fact that someone has written a substantial amount of non-trivial code does imply a higher level of competence and motivation compared to not having done that.

In general that is all implication and assumption, for any code, especially OSS code.

Hand-written code never implied much about quality no matter the author, especially as we all use libraries of reusable code of varying quality

  • Agree that just being hand-written doesn’t imply quality, but based on my priors, if something obviously looks like vibe-code it’s probably low quality.

    Most of the vibe-code I’ve seen so far appears functional to the point that people will defend it, but if you take a closer look it’s a massively over complicated rat’s nest that would be difficult for a human to extend or maintain. Of course you could just use more AI, but that would only further amplify these problems.

  • Not much, but infinitely more than now.

    If someone puts weeks and months of their time into building something, then I'm willing to take that as proof of their motivation to create something good.

    I'm also willing to take the existence of non-trivial code that someone wrote manually as proof of some level of competence.

    The presence of motivation + competence makes it more likely that the result could be something good.

We know. It is not difficult to tell them apart. Good taste is apparent and beauty is universal. The amount of care and attention someone put into a craft is universally appreciated. Also, I am 100% confident this comment was the output of a human process. We can tell. There is something more. It is obvious for those that have a soul.

  • Exactly. It's like looking at assembly that's been written by a person vs by a compiler. There's just no soul in the latter! And that's why compilers never caught on.

  • We know if we make the effort to find out. But what we really want to know is not whether AI was used in the process of writing the software. What we want to know is whether it's worth checking out. That's what has become harder to know.