Comment by csallen

5 hours ago

For some reason, tons of people seem to be in camps at both extremes. It's either "AI sucks don't trust it!" or "AI is so much better than humans!"

But the most reasonable take, which I'm happy to see reflected in so many comments in this thread, is… use both.

Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI. Then the unique shortcomings of each party can be covered by the other's strengths.

AI review is never going to beat a fully resourced human review.

It might beat an underresourced human review, on time, efficiency, cost metrics. But on the metric of accuracy, throwing unlimited humans at a problem will still beat throwing unlimited AI at it

  • That's an irrelevant comparison because cost is always a constraint, so there are not going to be unlimited AI or humans. The question is how to optimally combine them for a given cost.

> Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI.

You can do that, sure. But doing so negates any improvements in speed the LLM brought. And at that point, you may as well just do it yourself to begin with.

  • When Google showed up on the scene I found I no longer needed to memorize basic syntax and other such things. If I couldn't remember on the fly, i'd just do a quick google search and move on. This freed space in my mind to instead focus on bigger & better things.

    I use GenAI tools when coding a lot, but I do not vibe code. I go through everything it generated, and we iterate. And yes, it doesn't save me a lot of time. But what it does do is free up mental capacity in a similar manner. But instead of syntax, it's more complicated patterns. Maybe I don't remember how to stitch something together, but i know it can be done. Instead of spending the time to look it up and then code it, I just tell it to do it for me.

  • Yeah, humans reviewing the AI review can only detect the false positives, where the LLM claims something is non-compliant and flags it for review/correction by a human or another agent. Human review can’t find the false negatives (true deficiencies not flagged) unless you do a full audit yourself to find whatever deficiencies the AI missed.

  • I feel like you're missing the point that it's more thorough to use both. Speed isn't the only factor that matters.

This makes sense, but a logical next step is to have one AI write code, and then have another AI, instead of humans, verify it.

Or are current AIs too similar for that to be fruitful?

  • This is commonly known as "LLM-as-a-judge" and anecdotally multiple people I know who write code using OpenRouter or using multiple models say it's surprisingly effective. It's strange that there don't appear to be any major papers on it since ~early 2025, which at this point is basically ancient history.