← Back to context

Comment by sexylinux

5 days ago

It still does make errors, yes? Because it is not usable, if we need to verify everything. AI is only interesting if it can do things that humans can not do. If you can verify results because you can do it yourself, then why use AI? It will just bind highly skilled people to do verification work. Instead these people should do the actual work, results will come quicker.

So AI is only interesting to you / your org / humans if it can do things that you can not achieve. But if it still does errors, how could we ever know that super-invention by AI is not wrong?

If we can not rely on the correctness of the result, it is not usable at all. AI must create reliable and correct results always. That was a very fundamental requirement for computing. This problem has not been solved.

> AI is only interesting if it can do things that humans can not do.

AI is interesting as long as it can save time and/or money in getting an acceptable result. Anything that runs on a computer and can do "things that humans can do" will automatically end up doing things that humans won't do, simply by virtue of the fact that it runs on a machine that doesn't require sleep, doesn't get bored or demotivated, etc.

Verifying code (to a level where a responsible person is willing to take ownership for it) isn't trivial, sure; but writing the code by hand requires the same level of care, and the fact that the same person wrote it doesn't actually allow for shortcuts (if we're being properly responsible).

  • It doesn’t get bored or demotivated, but it also lacks interest and motivation generally so it comes with the same pitfalls of having nothing to lose and being utterly unaccountable, (e.g. destructive actions, lying, and being coercive or Machiavellian for no reason other than efficiency in achieving an arbitrary and artificial status of completion).

Humans make mistakes too, does it mean humans are unusable? We accept as empirical fast that most production quality code has 2 - 10 bugs per 1k LoC. According to your premise, virtually all existing software is therefor unusable.

What if an LLM overall starts to make less mistakes than a medium developer, costs less than its salary and is 100 x faster? For sure, the companies that will leverage these with just a few senior devs doing prompting, testing and requirements analysis, will outcompete other organizations.

  • Humans make mistake then to learn from it. A really good expert would never deliberately copy-paste an obscure solution from the internet, then to ask for forgiveness later.

    AI agents do that, perhaps not always, but still do. Now the question: would I trust AI without verifying its output?

    • Humans also make mistakes in ways that other humans can understand or expect. Sometimes LLMs make mistakes in a way that makes you say “no human would have ever done that”.

    • You can not trust human output without verification either. That's why you have tests, qa, staging envs, A/B tests..

There is plenty of work that does not need to be perfectly verified, because the risk is controlled. Prototyping a javascript game for example. Or code that runs just on your local machine where good enough is good enough. I'm sure a lot of you do super important work that needs 100% quality code all the time, but... some of us don't.

> Because it is not usable, if we need to verify everything.

Do you verify every line of code written by your fellow developers? I doubt it, which is strange because they make errors don't they?

What matters is the error rate. Past some threshold and they're better than senior devs who you don't supervise closely.

AI is like a junior developer. You have to review her code carefully but she is most definitely useful.

  • Why is your AI a she? What's up with gendering LLMs. Reminds me of Richard Dawkins calling Claude "Claudia" and insisting it to be conscious.

One does not need to be able to create it themselves to evaluate if the output is correct. Consider for example that you can easily determine if a meal tastes delicious without being an expert chef, or the fact that NP problems are very difficult to solve but make for easily verifiable solutions.